bxlab / galaxy-hackathon

Data intensive science for everyone.
https://galaxyproject.org/
Other
7 stars 2 forks source link

Training Data (includes tutorial, example) #28

Open yvanlebras opened 8 years ago

yvanlebras commented 8 years ago

Contributors: @jennaj @griffinp @kpoterlo @yvanlebras @BoughAida @ssander5 @devikaatgit @cschu @tnabtaf @kmurat1

This issue is dedicated to Training Data hackathon group. The idea is to gather sample data who can be used as example, tutorial, .... on Galaxy instances.

Please, don't hesitate to create a comment and add data links and description ;)

Example:

RADseq technology

Genetic map

-parents

female http://546969.197.189.163/datasets/bbbfa414ae315caf/display/
male http://546969.197.189.163/datasets/4467809fea030689/display/

-progeny

progeny 1 http://546969.197.189.163/datasets/ddf83cf807e6e774/display/
progeny 2 http://546969.197.189.163/datasets/30bf7a4ced2335cc/display/
....

Population genomics

barcode http://546969.197.189.163/datasets/6df0b7b066ddc4c9/display/
population map http://546969.197.189.163/datasets/d796ca8e1687a54b/display/
reference genome http://546969.197.189.163/datasets/06cf32e9aa8aad75/display/
FastQ file http://546969.197.189.163/datasets/34c3e3c01e1a37f4/display/

If data are not reachable through the web (personal data on your laptop, ...) , the best way is to upload the data on a https://usegalaxy.org/ Galaxy history

The idea can be to meet after having gathering data and discuss about which one are good / duplicate / too big before proposing actions like, data directly shareable, need to be reduced, ....

ghost commented 8 years ago

follow

MoHeydarian commented 8 years ago

follow

kkamieniecka commented 8 years ago

follow

yvanlebras commented 8 years ago

RADseq technology

Genetic map

Related usegalaxy.org history

Population genomics

Related usegalaxy.org history

There is a reference genome on the shared data so the analysis can be made through the denovo_map as the ref_map pipelines

Assemble read pairs

Related usegalaxy.org history

devikaatgit commented 8 years ago

How do I add the sample datasets that I have with me?

yvanlebras commented 8 years ago

@devikaatgit the best way is to upload the data on a https://usegalaxy.org/ Galaxy history. Then, share your history publicly. If you don't have an account, don't hesitate to create one, it's free ;)

frederikcoppens commented 8 years ago

@devikaatgit If you need help, let us know we can add it to our cloud instance too, this allows to put some structure in the data libraries (and share them later)

Eduardo-Alves commented 8 years ago

Tutorials for RNA-seq, Assembly and Variant calling using small publicly available dataasetsGalaxy_Walkthrough.pdf Galaxy-based RNA-Seq Intro.pptx Galaxy Variant Tutorial Mar16.pptx

yvanlebras commented 8 years ago

Thank you very much @Eduardo-Alves !

In the meantime, not sure I can use your material because of:

yvanlebras commented 8 years ago

@frederikcoppens Did you think there is a way to create Shared libraries for our group on the Galaxy main server ?

frederikcoppens commented 8 years ago

@yvanlebras That's one of the possibilities and my personal favorite, needs to be discussed

BoughAida commented 8 years ago

Bacterial RNA-seq data available at the following url http://54.158.166.52/u/aida/h/datahackathonab

ssander5 commented 8 years ago

I posted kind of "advanced training sets" on the https://github.com/bxlab/galaxy_hackathon/issues/30 post. They include larger data sets for RNAseq and RADseq that are all from publically accessible data, that highlight typical issues in data analysis, and use published analyses. Might be good as a kind of second pass training set for each of these analyses, as they may not be as straight forward as a "toy set" since they are real data with real issues.

I did not upload the data yet, because I have to transfer it from the cluster to my computer and back up to galaxy (unless anyone knows a faster means of doing this?).

devikaatgit commented 8 years ago

2 condition datasets with (single replicates only) for RNA-seq of bacteria can be accesses at https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets

jennaj commented 8 years ago

thanks everyone!

What is next? We need a list ticket for to-do items. Can reference this ticket and others. Would someone like to draft one or should I?

Shared Lib on Main > assign to me. Moving the data above into that, organized and labeled, is also me (in collaboration with authors above and in master ticket). Use the hack mailing list to synch up for this and related?