TimD1 / vcfdist

vcfdist: Accurately benchmarking phased variant calls
GNU General Public License v3.0
70 stars 6 forks source link

Cannot start an already-running timer (writing) #24

Closed mcolpus closed 5 months ago

mcolpus commented 5 months ago

I get an error "Cannot start an already-running timer (writing)." whenever I try to use the realign flags. Using -rq leads to "Cannot start" error. Using -rt leads to "Cannot stop" error. Version is v2.5.0

command:

vcfdist tmp1.vcf tmp2.vcf ../../h37rv_20231215.fa -p small_test/ -rq -v 2

output:

[INFO  vcfdist 10:50:17] Command: 'vcfdist tmp1.vcf tmp2.vcf ../../h37rv_20231215.fa -p small_test/ -rq -v 2'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [0/8] Loading reference FASTA '../../h37rv_20231215.fa'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 0/8] Parsing QUERY VCF 'tmp1.vcf'
[WARN  vcfdist 10:50:17] 'PS' tag not defined in QUERY VCF header, assuming one phase set per contig
[INFO  vcfdist 10:50:17]   Genotypes:
[INFO  vcfdist 10:50:17]       1: 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Variant types:
[INFO  vcfdist 10:50:17]     Haplotype 1
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 1
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]     Haplotype 2
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 0
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 1 | 0 variants
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   QUERY VCF overview:
[INFO  vcfdist 10:50:17]     TOTAL: 1
[INFO  vcfdist 10:50:17]     KEPT : 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [T 0/8] Parsing TRUTH VCF 'tmp2.vcf'
[WARN  vcfdist 10:50:17] 'PS' tag not defined in TRUTH VCF header, assuming one phase set per contig
[INFO  vcfdist 10:50:17]   Genotypes:
[INFO  vcfdist 10:50:17]       1: 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Variant types:
[INFO  vcfdist 10:50:17]     Haplotype 1
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 1
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]     Haplotype 2
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 0
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 1 | 0 variants
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   TRUTH VCF overview:
[INFO  vcfdist 10:50:17]     TOTAL: 1
[INFO  vcfdist 10:50:17]     KEPT : 1
[INFO  vcfdist 10:50:17]   Writing original query VCF to './small_test/orig-query.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Checking contigs:
[INFO  vcfdist 10:50:17]     All contig checks passed!
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 1/8] Wavefront clustering QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 2/8] Realigning QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 3/8] Wavefront reclustering QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [T 3/8] Wavefront clustering TRUTH VCF 'tmp2.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [4/8] Superclustering TRUTH and QUERY variants
[INFO  vcfdist 10:50:17]            Total superclusters: 1
[INFO  vcfdist 10:50:17]   Largest supercluster (bases): 3
[INFO  vcfdist 10:50:17]   Largest supercluster  (vars): 2
[INFO  vcfdist 10:50:17]   Average supercluster (bases): 3.000
[INFO  vcfdist 10:50:17]   Average supercluster  (vars): 2.000
[INFO  vcfdist 10:50:17]               QUERY phase sets: 1
[INFO  vcfdist 10:50:17]         QUERY phase block NG50: 4411532
[INFO  vcfdist 10:50:17]               TRUTH phase sets: 1
[INFO  vcfdist 10:50:17]         TRUTH phase block NG50: 4411532
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Sorting superclusters by size
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [5/8] Calculating precision and recall
[INFO  vcfdist 10:50:17]   Superclusters using   0.000 to   1.000 GB RAM each ( 64 threads):        1
[INFO  vcfdist 10:50:17]   Superclusters using   1.000 to   2.000 GB RAM each ( 32 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   2.000 to   4.000 GB RAM each ( 16 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   4.000 to   8.000 GB RAM each (  8 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   8.000 to  16.000 GB RAM each (  4 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using  16.000 to  32.000 GB RAM each (  2 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using  32.000 to  64.000 GB RAM each (  1 threads):        0
[INFO  vcfdist 10:50:17]     done with precision-recall
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [6/8] Skipping distance metrics
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [7/8] Phasing superclusters
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 0 switch errors, 0 flip errors, 1 phase blocks
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]              Total  phase blocks: 1
[INFO  vcfdist 10:50:17]              Total switch errors: 0
[INFO  vcfdist 10:50:17]              Total   flip errors: 0
[INFO  vcfdist 10:50:17]   Supercluster switch error rate: 0.000000%
[INFO  vcfdist 10:50:17]     Supercluster flip error rate: 0.000000%
[INFO  vcfdist 10:50:17]                Phase block  NG50: 0
[INFO  vcfdist 10:50:17]   (switch)     Phase block NGC50: 0
[INFO  vcfdist 10:50:17]   (switchflip) Phase block NGC50: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Writing phasing summary to './small_test/phasing-summary.tsv'
[ERROR vcfdist 10:50:17] Cannot start an already-running timer (writing).

I get this even with a simple test vcf:

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=/opt/bin/run_clair3.sh --bam_fn=aln.bam --ref_fn=h37rv_20231215.fa.gz --threads=4 --platform=ont --model_path=r941_prom_hac_g360_g422 --include_all_ctgs --no_phasing_for_fa --haploid_sensitive --enable_long_indel --sample_name=aln.bam --output=clair_out
##reference=/home/ubuntu/clair3_test/work/b3/70944cf0cf12a9e81fa8b66abba325/h37rv_20231215.fa.gz
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=NC_000962.3,length=4411532>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  aln.bam
NC_000962.3 1977    .   A   G   25.50   PASS    P   GT:GQ:DP:AD:AF  1:25:78:0,75:0.9615

and

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=/opt/bin/run_clair3.sh --bam_fn=aln.bam --ref_fn=h37rv_20231215.fa.gz --threads=4 --platform=ont --model_path=r941_prom_sup_g5014 --include_all_ctgs --no_phasing_for_fa --haploid_sensitive --enable_long_indel --sample_name=aln.bam --output=clair_out
##reference=/home/ubuntu/clair3_test/work/76/abd15184b468f56892711a46836a33/h37rv_20231215.fa.gz
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=NC_000962.3,length=4411532>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  aln.bam
NC_000962.3 1977    .   A   G   36.28   PASS    F   GT:GQ:DP:AD:AF  1:36:78:0,75:0.9615
TimD1 commented 5 months ago

Thanks for raising this issue. I just pushed a fix in v2.5.1.