TNTurnerLab / Tortoise

Tortoise is the CPU workflow of the HAT https://github.com/TNTurnerLab/HAT tools.
MIT License
5 stars 0 forks source link

Error in rule glnexus_dv when loading gvcf files into database #1

Closed steven-solar closed 1 year ago

steven-solar commented 1 year ago

In rule glnexus_dv I receive the following error

[GLnexus] [error] <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz Exists: sample is currently being added; each input gVCF should have a unique sample name (header column #10) (UnnamedSample (<path>/dv_out/<sample_id>.dv.cpu.gvcf.gz))
[GLnexus] [error] Failed to bulk load into DB: Failure: One or more gVCF inputs failed validation or database loading; check log for details.

I believe it is because the files <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz created by rule deepvariant have header lines of the form:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT UnnamedSample

rather than

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT <sample_id>

Manually re-headering the gvcfs and re-running from this rule seems to do the trick and allows the process to continue. Is there some parameter I am missing to insert the appropriate names into the gvcfs, or could it be an error in their generation?

jng2 commented 1 year ago

Hi Steven,

The name generated in the DeepVariant output should come from the cram/bam file itself. These names should match whatever is found in the "family_file" that you point to in the config file.

If you look at the header of your input, does SM also say UnnamedSample?

Jeff

jng2 commented 1 year ago

I'm going to close the issue. If you have further questions, please feel free to reach out.

Thanks, Jeff

steven-solar commented 1 year ago

Hi Jeff,

Sorry for the slow reply, but I think you were correct. My input SAM doesn't have an @SM line, which is what caused the problem. Sorry for the confusion, I appreciate the help!