MorrellLAB / sequence_handling

A series of scripts to automate sequence workflows
19 stars 8 forks source link

Truncated Files with Picard Handler #30

Closed TomJKono closed 7 years ago

TomJKono commented 7 years ago

This is a complicated issue. I think I have a hunch about where it originates, though.

In the config file, the encoding variable (illumina, sanger, solexa) is used as the PL tag in the read groups. Picard is called several times in succession in the Picard SAM handler. Picard is very strict in its validation for BAM headers, and PL=sanger is an illegal tag/value combination. It could be that this causes an error, and an empty BAM file is written to disk on the first call to Picard. Successive calls to Picard would then try to read an empty BAM file, and throw a truncated file error.

Try setting VALIDATION_STRINGENCY=silent in Picard to see if this fixes the truncated file error. If it does, then you'll have to re-work how the encoding/platform variables are treated. Quality score encoding is not the same as sequencing platform, anyway.

Aerin13 commented 7 years ago

Commit 9921ebc fixed this issue.