Closed blackbeerd closed 1 month ago
Hi @blackbeerd,
Many thanks for reporting this. Sad that we cannot repeat the issue locally.
Could you share the logs ${OUPUT_DIR}/run_clair3.log
and ${OUPUT_DIR}/log
or send them to my email zxzheng@cs.hku.hk? That would help us to understand the malformed VCF header.
Zhenxian
Sure - I will send them via email if the files are not too large.
From: zhenxian @.> Sent: Monday, September 2, 2024 4:21:55 AM To: HKU-BAL/Clair3 @.> Cc: Christopher Boniface @.>; Mention @.> Subject: [EXTERNAL] Re: [HKU-BAL/Clair3] malformed metadata and header in temporary vcfs when running clair3 with whatshap phase with large bam file (Issue #336)
Many thanks for reporting this. Sad that we cannot repeat the issue locally.
Could you share the logs ${OUPUT_DIR}/run_clair3.log and ${OUPUT_DIR}/log or send them to my email @.**@.>? That would help us to understand the malformed VCF header.
Zhenxian
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/HKU-BAL/Clair3/issues/336*issuecomment-2324490734__;Iw!!Mi0JBg!LGHXRf_72reW8i1t44vQdfgUN9rssSfT4X6x8xkPatzkbFCoUYZOJn9cSh9oRYILsHAyT8cH7wekVVKJVUrxJX2d$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGWRDKO6XY5RXZ3PXJFGGMLZURC5HAVCNFSM6AAAAABNNJPJKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUGQ4TANZTGQ__;!!Mi0JBg!LGHXRf_72reW8i1t44vQdfgUN9rssSfT4X6x8xkPatzkbFCoUYZOJn9cSh9oRYILsHAyT8cH7wekVVKJVeu-a2_c$. You are receiving this because you were mentioned.Message ID: @.***>
I'm running clair3 with the --use_whatshap_for_final_output_haplotagging option (see clair3 command at bottom) on a 1.1TB bam generated on the ONT platform (basecalling and alignment with dorado/minimap2) and the final output bam is missing haplotype info. On step 7/7 "Phasing VCF output in parallel using WhatsHap" whatshap throws parse and invalid contig errors that indicate it is trying to read a ##cmdline comment in the vcf as a contig:
After checking input vcf used for this step (called "merge_1.vcf"), I found what appears to be a malformed vcf metadata where there is a "##cmdline" entry AFTER the field header line, #CHROM POS ... etc., like so:
I'm not sure what is causing this but my hacky fix was to remove that line from the temporary vcfs (merge_{n}.vcf) and rerun clair3_c_impl.sh from the
whatshap phase
step... I removed the incorrectly placed vcf lines with this grep command (ran in the tmp/mergedoutput/ directory), although now the metadata will be missing the command line record (oh well): `for i in merge*.vcf; do mv $i $i.tmp; cat $i.tmp | grep -v "##cmdline" > $i; done`Thanks, Chris
Original Clair3 command line: