bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

variant2 #11

Closed tanglingfung closed 11 years ago

tanglingfung commented 11 years ago

so, for the variant2 pipeline, trim read is not recommended?

and for some reasons, i have issues generating insert size, duplicates and variant summary, where should I trace after for possible mistake on my side?

Thanks.

chapmanb commented 11 years ago

Paul; variant2 is a slimmed down pipeline where the initial alignment step is done via unix pipes. The step by step disk intensive processes in the initial pipeline did not scale with large whole genome sequences. As a result, some of the steps like trim_reads is not yet implemented there. Quality improvements in reads have made trimming less crucial, although this can always change as read lengths. Nick Loman has a good post on this from the perspective of assembly:

http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/

Are you noticing specific problems, or just curious about new pipeline?

For the summary issues, can you paste the error message you're seeing? Thanks much.

tanglingfung commented 11 years ago

Thanks Brad.

I am just curious about the new pipeline at the moment coz it sounds like the pipeline could handle exome scale work.

As for the summary stats, there is no error message but some metrics files are skipped. The variant calling also stops after snpeff annotation (the file ended with -effect.vcf) while I was expecting -annotated.vcf. Other things work fine. I think there should be don't setting issues on my end. I tried to look for them but do not know where to start.

On Thursday, April 25, 2013, Brad Chapman wrote:

Paul; variant2 is a slimmed down pipeline where the initial alignment step is done via unix pipes. The step by step disk intensive processes in the initial pipeline did not scale with large whole genome sequences. As a result, some of the steps like trim_reads is not yet implemented there. Quality improvements in reads have made trimming less crucial, although this can always change as read lengths. Nick Loman has a good post on this from the perspective of assembly:

http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/

Are you noticing specific problems, or just curious about new pipeline?

For the summary issues, can you paste the error message you're seeing? Thanks much.

— Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/11#issuecomment-17027942 .

tanglingfung commented 11 years ago

and I may have read your code wrong. It appears to me that it would do the recalibration before realignment, but GATK recommends doing realignment before recalibration. I am not sure if there would be any differences but is there any reasons to do recalibration before realignment? just curious

chapmanb commented 11 years ago

Paul; Thanks as always for all the helpful feedback. I'll try to take these points one by one:

Hope this helps. Let me know if you have any more questions.

tanglingfung commented 11 years ago

Thanks Brad. I wish I could be contributing more than giving feedback soon