Closed wdecoster closed 6 years ago
Quick update: sorting does not seem to prevent this printing of logs. I'll try again with explicitly specifying output with -o
.
duphold always outputs those messages to stderr. your parallel
command must somehow be redirecting stderr to stdout.
Hrm, possible, but that's something I haven't encountered before... Strange. In any case, when using -o
everything is normal. Nice increase of precision (simulated pacbio data, called with sniffles) but a substantial reduction in recall (75% prior to filtering, 28% afterwards). I'll investigate further.
Hi Brent,
I ran the following command on a bunch of (single sample) vcf files and their bam file:
ls *.vcf | parallel 'duphold --bam {.}.bam --fasta genome.fna.gz --vcf {} > {.}_dh.vcf'
I'm now post-processing these duphold-annotated vcfs for filtering using bcftools and tabix, and get a bunch of warnings and error which I'm looking into.
The first one is generated by bcftools sort:
[E::vcf_parse_format] Incorrect number of FORMAT fields at chr15:101764387
grepping for that line with
-C 5
gives me:I suck at reading instructions, and it seems be that my input SV vcf file is not sorted. I'll repeat things there and see if the error reproduces, but I'm quite surprised to see these logging messages in the output file.
Cheers, Wouter