variant2 - Githubissues

tanglingfung commented 11 years ago

so, for the variant2 pipeline, trim read is not recommended?

and for some reasons, i have issues generating insert size, duplicates and variant summary, where should I trace after for possible mistake on my side?

Thanks.

chapmanb commented 11 years ago

Paul; variant2 is a slimmed down pipeline where the initial alignment step is done via unix pipes. The step by step disk intensive processes in the initial pipeline did not scale with large whole genome sequences. As a result, some of the steps like trim_reads is not yet implemented there. Quality improvements in reads have made trimming less crucial, although this can always change as read lengths. Nick Loman has a good post on this from the perspective of assembly:

http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/

Are you noticing specific problems, or just curious about new pipeline?

For the summary issues, can you paste the error message you're seeing? Thanks much.

tanglingfung commented 11 years ago

Thanks Brad.

I am just curious about the new pipeline at the moment coz it sounds like the pipeline could handle exome scale work.

As for the summary stats, there is no error message but some metrics files are skipped. The variant calling also stops after snpeff annotation (the file ended with -effect.vcf) while I was expecting -annotated.vcf. Other things work fine. I think there should be don't setting issues on my end. I tried to look for them but do not know where to start.

On Thursday, April 25, 2013, Brad Chapman wrote:

Paul; variant2 is a slimmed down pipeline where the initial alignment step is done via unix pipes. The step by step disk intensive processes in the initial pipeline did not scale with large whole genome sequences. As a result, some of the steps like trim_reads is not yet implemented there. Quality improvements in reads have made trimming less crucial, although this can always change as read lengths. Nick Loman has a good post on this from the perspective of assembly:

http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/

Are you noticing specific problems, or just curious about new pipeline?

For the summary issues, can you paste the error message you're seeing? Thanks much.

— Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/11#issuecomment-17027942 .

tanglingfung commented 11 years ago

and I may have read your code wrong. It appears to me that it would do the recalibration before realignment, but GATK recommends doing realignment before recalibration. I am not sure if there would be any differences but is there any reasons to do recalibration before realignment? just curious

chapmanb commented 11 years ago

Paul; Thanks as always for all the helpful feedback. I'll try to take these points one by one:

Insert size metrics: good catch, these were not correctly being calculated for BAM file inputs. I checked in a fix which enables this.
Duplication metrics: calculating duplications on whole genomes is extremely slow, so the duplication moved to work on sections of files in parallel. This means that there is not a global dup_metrics. I'm looking for lightweight ways to bring this back into the final report.
snpEff -annotated.vcf file: This is no longer generated and we use the snpEff VCF directly instead of the GATK annotation walker. GATK was not keeping up with snpEff revisions and did not appear to be offering a lot of value over the direct snpEff VCF.
Realignment and recalibration: Due to performance issues with scaling this on multiple whole genomes, we now perform realignment in parallel on genomic sections. Recalibration requires the entire file to be processed together, but recombining whole genome files does not scale well so I opted for a practical approach that does calculates recalibration metrics first, then does the application of realignment and recalibration on each segment.

Hope this helps. Let me know if you have any more questions.

tanglingfung commented 11 years ago

Thanks Brad. I wish I could be contributing more than giving feedback soon

bcbio / bcbio-nextgen

variant2 #11