bxlab / galaxy-hackathon

Data intensive science for everyone.
https://galaxyproject.org/
Other
7 stars 2 forks source link

Workflow Best Practices #29

Open cschu opened 8 years ago

cschu commented 8 years ago
  1. Build Workflows
    • RNA-seq quantification
    • RNA-seq variant calling
    • ChIP-seq
  2. Associated Datasets/Training Data
  3. Interactive Tours based on content
  4. Share with GTN/GOBLET
  5. Galaxy Flavour for Workflow

Associated Coding Hack Tasks

  1. Workflow improvements @bgruening @yvanlebras @kpoterlo @jennaj @firaan1 @MoHeydarian @ssander5 @kmurat @cschu
    • bxlab#4 (Submit workflows from workflow designer)
    • Creation/Export of High Res Images for workflows
    • Workflow Export to Unix @ssander5 @frederikcoppens
firaan1 commented 8 years ago

follow

MoHeydarian commented 8 years ago

follow

ghost commented 8 years ago

follow

ssander5 commented 8 years ago

follow

ssander5 commented 8 years ago

Any thoughts on including RADtag workflow? I know stacks/tuxedo is part of the program list on galaxy and RRL genome information is useful to a lot of folks, and I'm not sure that there is a workflow on galaxy yet for it!

frederikcoppens commented 8 years ago

FreeBayes vs GATK

https://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/

yvanlebras commented 8 years ago

Follow and maybe @devikaatgit want too ;)

yvanlebras commented 8 years ago

yes @ssander5 I have Stacks workflows ;)

MoHeydarian commented 8 years ago

We've talked about generating basic workflows for::

The idea is to provide standard workflows to give an idea of the basics for end-users. We can couple this with a short discussion of alternative options for mapping and quantification for each workflow and list the various benefits/shortcomings of these options, as it is likely impossible for any two bioinformaticians to agree on what the best practice would be.

Ideally we will contact Gigascience and see if they are interested in a 'best practices' paper of Galaxy workflows. They were very interested in this last year.

There are versions of these on the cloud under Shared Data -> Workflows.

devikaatgit commented 8 years ago

I am very much interested in this...but I have a point to state, separate workflows for prokaryotic and eukaryotic genomes/transcriptomes are needed as exons are not a thing with proks... Also I am not sure if paired end reads and single end reads can be run using the same workflow...

I will be sharing 2 dataset for 2 conditions as samples for prokaryotic RNA-seq, and if required the reference genome for them too.

My idea of workflow is for a microbial RNA-seq quantification that would be BowTie--> StringTie--> Ballgown after required trimming and quality check (Since, prokaryotes do not have exons - tophat/HISAT and similar spliced aligners are not needed, i feel Bowtie itself does an efficient job).

Since ballgown is yet not available in galaxy, it could be modified to BowTie --> Stringtie --> Cuffmerge and Cuffdiff. (Stringtie seems to be faster than Cufflinks???)...

Of course htseq is also a good option. Plus this is just a basic idea that I have, i am not sure about how "galaxy workflows" work,, so suggestions/comments are welcome and expected.

MoHeydarian commented 8 years ago

@devikaatgit you are correct that we would need different workflows for different RNA-seq purposes (SE v PE, transcript quantification v transcript reconstruction, prokaryotic v eukaryotic, etc). If we make a workflow for all of the conditions we end up with an overwhelming list of options.

I propose instead to illustrate the general steps in the a given RNA/chIP-seq workflow and discuss the alternative tools that can be used at the various analysis steps with their benefits and shortcomings. Among our group we have lots of experience with these alternative options and can highlight these nicely in a manuscript (for a preprint, or submit to Gigascience, or both). The end user can then modify our basic workflows to their specific experimental design easily in the workflow editor.

We should discuss it as a group, here or in person when we gather next.

ghost commented 8 years ago

I think @devikaatgit raised good point about separate workflows for prokaryotes

yvanlebras commented 8 years ago

I'm ok to begin basic workflows based on stacks for::

ssander5 commented 8 years ago

@yvanlebras I have some test data that might be good to use for a Shared Data Library to go with your stacks pipeline.

RADseq data from Oaks (chosen because the genome available, has small data sets, unique endpoint analysis for RADseq) Publication: Hipp AL et al., "A framework phylogeny of the American oak clade based on sequenced RAD data.",PLoS One, 2014 Apr 4;9(4):e93975

Libraries:

Typical Issues with RADseq Captured:

Aims Captured:

cschu commented 8 years ago

@devikaatgit I have checked with Dave B., who started to wrap Ballgown last year. He confirmed that he has not finished the wrapper, yet. I remember finding this particular wrapper quite challenging, myself, so I opened a ticket requesting the wrapper to be created again. #41

devikaatgit commented 8 years ago

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads that I have (not Fastqsanger/Illumina)... As I had discussed with some of you earlier, I used bowtie Commandline tool from desktop and took to galaxy for the remaining steps only, as it was easier and took less time to upload BAM files - compared to fastq files.

yvanlebras commented 8 years ago

Hi @devikaatgit,

Bowtie is on the usegalaxyserver https://usegalaxy.org/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.2.6.2 and I found it also on the dedicated cloud one

----- Mail original -----

De: "devikaatgit" notifications@github.com À: "bxlab/galaxy_hackathon" galaxy_hackathon@noreply.github.com Cc: "Yvan Le Bras" yvan.le_bras@irisa.fr, "Mention" mention@noreply.github.com Envoyé: Dimanche 26 Juin 2016 15:31:46 Objet: Re: [bxlab/galaxy_hackathon] Workflow Best Practices (#29)

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads that I have (not Fastqsanger/Illumina)... As I had discussed with some of you earlier, I used bowtie Commandline tool from desktop and took to galaxy for the remaining steps only, as it was easier and took less time to upload BAM files - compared to fastq files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .


Yvan Le Bras, PhD @Yvan2935 <°))))>< e-Biogenouest project http://www.e-biogenouest.org CNRS UMR 6074 IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex tél.: +33 (0) 2 99 84 71 79 / +33 (0) 6.10.43.96.51 yvan.le_bras@irisa.fr

devikaatgit commented 8 years ago

Hi thank you for that, but I was looking for BOWTIE specifically (since my reads are shorter than 50bp, BOWTIE is faster and sensitve than BOWTIE2 )... There is the option to map with BOWTIE for fastqillumina reads, but not normal fastq reads... Anybody knows how FASTQgroomer functions? Is it used to convert fastq reads to fastqillumina?

frederikcoppens commented 8 years ago

I'm installing original bowtie in the Cloud instance, should appear in a bit

devikaatgit commented 8 years ago

That would be great @frederikcoppens THUMBS UP!!!

frederikcoppens commented 8 years ago

which reference do you need, the tools there but without ref it's not very useful ;-)

devikaatgit commented 8 years ago

http://bacteria.ensembl.org/Staphylococcus_aureus_subsp_aureus_str_jkd6008/Info/Index

devikaatgit commented 8 years ago

Anybody has some good preloaded workflows for RNA-seq in galaxy? Please share...

frederikcoppens commented 8 years ago

ref is there, let me know if you have issues @MoHeydarian do you an RNA-seq workflow available

yvanlebras commented 8 years ago

@ssander5 thanks for the RADseq data!

@MoHeydarian or @frederikcoppens, We just finish yesterday a first total STACKS 1.4.0 Galaxy integration, can you install coresponding tools (through the suite_stacks main TS repo) on the dedicated AWS VM ?

devikaatgit commented 8 years ago

Pls see the following link https://github.com/nekrut/galaxy/wiki/Reference-based-RNA-seq

devikaatgit commented 8 years ago

https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1

This is a workflow that I created for the purpose. Sample input datasets are available at

https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets

@MoHeydarian @yvanlebras pls see and comment nd also let me know whats to be done next.

ghost commented 8 years ago

Hi I am repeatly getting the following while trying to run chip-seq worflkow on cloud:

Internal Server Error Galaxy was unable to successfully complete your request

An error occurred. This may be an intermittent problem due to load or other unpredictable factors, reloading the page may address the problem.

The error has been logged to our team.

MoHeydarian commented 8 years ago

@kpoterlo I think the issue was that the wrong output selection from Trimmomatic was chained to BWA in the workflow. I corrected the Trimmomatic output chained to BWA and the workflow goes to completion, though there is an error with MACs appearing now.

The corrected version is under shared data -> workflows and indicated with 'v2'.

ghost commented 8 years ago

@MoHeydarian thanks, running it for mouse p63 chip-seq in keratinocytes

MoHeydarian commented 8 years ago

We have working versions of the RNAseq and chIPseq workflows under shared data -> workflows.

The RNAseq workflow got a bit messy in order for Deseq2 to use htseq input. For the purposes of showing an introductory example workflow, it may be worth using Cuffdiff to quantify transcript expression (for a clearer example).

I have also shared two histories with example data that works with these workflows. You can find these under the tool icon in the history panel under 'Histories shared with me'.

Comments and discussion, go!

yvanlebras commented 8 years ago

@MoHeydarian @frederikcoppens I'm testing Stacks tools on the VM but encounter difficulties because it appears the installation of binaries fails?

error message:

Fatal error: Exit code 127 (Error in Stacks execution)
/mnt/galaxy/tmp/job_working_directory/001/1007/tool_script.sh: line 9: denovo_map.pl: command not found

I think this is due to the galaxy config of the AWS and related to conda. Can you try this:

sudo cp /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml.sample /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

put conda at the first on the list in:

sudo nano /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

then add in config/galaxi.ini:

conda_auto_install = True
conda_auto_init = True

restart and desinstall / reinstall stacks?

yvanlebras commented 8 years ago

@MoHeydarian @frederikcoppens Can you also add MultiQC tool to the AWS instance ? I just finish a MultiQC Galaxy tour that you can add to the instance too. Maybe MultiQC can be add as a final step of Data hackathon workflow using compatible tools like FastQC, cutadapt, Tophat2, Featurecounts, Samtools stats, Picard, Bismark...