eastgenomics / eggd_generate_variant_workbook

DNAnexus app for generating xlsx variant workbooks
3 stars 0 forks source link

Empty variants sheets with wrong fastq naming #37

Closed Yu-jinKim closed 3 years ago

Yu-jinKim commented 4 years ago

For G001616 and for some old samples, the fastqs are named following this convention: G001616.1.fq.gz The intermediate files generated in the pipeline and used for vcf2xls are thus named G001616.1_{suffix}. So vcf2xls believes that the sample is named G001616.1, project-Fvz6vbQ45yK1VyfJ9Vvzp97x:job-FvzjZ3045yK7gb1X9Y2FZg1q:

echo G004271.1_markdup_recalibrated_Haplotyper.refseq_nirvana_2010
sample_id=G004271.1

This shouldn't happen, there is a check to see if the sample is in the bioinformatic manifest, so the job should have failed

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

Yu-jinKim commented 4 years ago

I tested the check if sample in manifest subroutine. No issues there, sooooo no idea where the issue might be

Yu-jinKim commented 4 years ago

The script was using the sample name extracted in the vcf name to access the genotype column in the vcf. But since the pipeline thought the name had a ".1" suffix, it used that wrong name to name the genotype column making the genotypes inaccessible and skipping every variant.