EichlerLab / pav

Phased assembly variant caller
97 stars 8 forks source link

Error in rule vcf_write_vcf #64

Open yue131 opened 3 weeks ago

yue131 commented 3 weeks ago

Hello, I encountered into some problems while using this native pav to call SV from assembly-hap and I tried many times and still couldn't make it. If you can give me some suggestions, I'd appreciate it! The commands and configuration files are listed below:

assemblies.tsv NAME HAP1 HAP2 YCM /path/YCM.asm.bp.hap1.p_ctg.gfa /path/YCM.asm.bp.hap2.p_ctg.gfa config.json { "reference": "/path/CHM13_UCSC/hs1.fa" }

snakemake -s /share/path/Snakefile YCM.vcf.gz --cores 25

And part of run.log is as follows: The begain:

Building DAG of jobs...

Using shell: /usr/bin/bash Provided cores: 25 Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


align_depth_bed 2 1 1 align_get_read_bed 2 1 1 align_get_tig_fa 2 1 1 align_map 2 24 24 align_trim_tig 2 1 1 align_trim_tigref 2 1 1 call_callable_regions 2 1 1 call_cigar 20 1 1 call_cigar_merge 2 1 1 call_integrate_fail_fa 6 1 1 call_integrate_filter_redundant 8 1 1 call_integrate_sources 2 1 1 call_intersect_fail 8 1 1 call_intersect_fail_batch 160 1 1 call_inv_batch 120 4 4 call_inv_batch_merge 2 1 1 call_inv_cluster 4 1 1 call_inv_flag_insdel_cluster 4 1 1 call_inv_merge_flagged_loci 2 1 1 call_lg_discover 20 12 12 call_lg_split 2 1 1 call_merge_batch_table 1 1 1 call_merge_haplotypes 6 1 1 call_merge_haplotypes_batch 160 12 12 call_merge_haplotypes_snv 2 1 1 call_merge_lg 6 1 1 data_align_ref 1 1 1 data_align_ref_anno_n_gap 1 1 1 data_ref_contig_table 1 1 1 vcf_write_vcf 1 1 1 total 553 1 24

Select jobs to execute...

Error: Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 25 Rules claiming more threads will be scaled down. Provided resources: mem_mb=24576, mem_mib=23438, disk_mb=1000, disk_mib=954 Select jobs to execute... [Tue Oct 1 13:28:01 2024] Error in rule vcf_write_vcf: jobid: 0 input: results/YCM/bed_merged/pass/snv_snv.bed.gz, results/YCM/bed_merged/pass/svindel_ins.bed.gz, results/YCM/bed_merged/pass/svindel_del.bed.gz, results/YCM/bed_merged/pass/sv_inv.bed.gz, results/YCM/bed_merged/fail/snv_snv.bed.gz, results/YCM/bed_merged/fail/svindel_ins.bed.gz, results/YCM/bed_merged/fail/svindel_del.bed.gz, results/YCM/bed_merged/fail/sv_inv.bed.gz, results/YCM/bed_merged/pass/fa/svindel_ins.fa.gz, results/YCM/bed_merged/pass/fa/svindel_del.fa.gz, results/YCM/bed_merged/fail/fa/svindel_ins.fa.gz, results/YCM/bed_merged/fail/fa/svindel_del.fa.gz, data/ref/contig_info.tsv.gz output: YCM.vcf.gz

The final: RuleException: RuntimeError in file /share/home/yuejw/software/PAV/pav/rules/vcf.snakefile, line 92: Unknown variant type: "SV" at index chr1-16394892-INV-161108 File "/share/home/yuejw/software/PAV/pav/rules/vcf.snakefile", line 92, in __rule_vcf_write_vcf File "/share/home/yuejw/software/PAV/pav/pavlib/vcf.py", line 198, in write_merged_vcf File "/share/home/yuejw/software/PAV/pav/dep/svpop/svpoplib/vcf.py", line 272, in ref_base File "/share/home/yuejw/miniconda3/envs/HiFi/lib/python3.7/concurrent/futures/thread.py", line 57, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Exiting because a job execution failed. Look above for error message

Thanks!

paudano commented 3 weeks ago

I think you might have a broken version of PAV, can you pull v2.4.1.2 and run from there? I corrected some VCF problems in the latest version.

jeffchen2000 commented 2 weeks ago

I am running the latest version just now (Oct-16-2024), and I had the sample issue apptainer run --bind ${PWD}:${PWD} library://becklab/pav/pav:latest -c 2

RuleException: RuntimeError in file /opt/pav/rules/vcf.snakefile, line 87: Unknown variant type: "SV" at index chr1-26639828-INV-8356 File "/opt/pav/rules/vcf.snakefile", line 87, in __rule_vcf_write_vcf File "/opt/pav/pavlib/vcf.py", line 198, in write_merged_vcf File "/opt/pav/dep/svpop/svpoplib/vcf.py", line 272, in ref_base [Wed Oct 16 11:29:34 2024] Error in rule vcf_write_vcf: jobid: 0 input: results/NA18500/bed_merged/pass/snv_snv.bed.gz, results/NA18500/bed_merged/pass/svindel_ins.bed.gz, results/NA18500/bed_merged/pass/svindel_del.bed.gz, results/NA18500/bed_merged/pass/sv_inv.bed.gz, results/NA18500/bed_merged/fail/snv_snv.bed.gz, results/NA18500/bed_merged/fail/svindel_ins.bed.gz, results/NA18500/bed_merged/fail/svindel_del.bed.gz, results/NA18500/bed_merged/fail/sv_inv.bed.gz, results/NA18500/bed_merged/pass/fa/svindel_ins.fa.gz, results/NA18500/bed_merged/pass/fa/svindel_del.fa.gz, results/NA18500/bed_merged/fail/fa/svindel_ins.fa.gz, results/NA18500/bed_merged/fail/fa/svindel_del.fa.gz, data/ref/contig_info.tsv.gz output: NA18500.vcf.gz

Exiting because a job execution failed. Look above for error message WorkflowError: At least one job did not complete successfully. [Wed Oct 16 11:29:35 2024] Error in rule vcf_write_vcf: jobid: 1 input: results/NA18500/bed_merged/pass/snv_snv.bed.gz, results/NA18500/bed_merged/pass/svindel_ins.bed.gz, results/NA18500/bed_merged/pass/svindel_del.bed.gz, results/NA18500/bed_merged/pass/sv_inv.bed.gz, results/NA18500/bed_merged/fail/snv_snv.bed.gz, results/NA18500/bed_merged/fail/svindel_ins.bed.gz, results/NA18500/bed_merged/fail/svindel_del.bed.gz, results/NA18500/bed_merged/fail/sv_inv.bed.gz, results/NA18500/bed_merged/pass/fa/svindel_ins.fa.gz, results/NA18500/bed_merged/pass/fa/svindel_del.fa.gz, results/NA18500/bed_merged/fail/fa/svindel_ins.fa.gz, results/NA18500/bed_merged/fail/fa/svindel_del.fa.gz, data/ref/contig_info.tsv.gz output: NA18500.vcf.gz

Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-10-15T154641.291416.snakemake.log WorkflowError: At least one job did not complete successfully.

jeffchen2000 commented 2 weeks ago

I also tried v2.4.1.2, and I got exactly the same error. "Unknown variant type: "SV" at index chr1-26639828-INV-8356"

paudano commented 2 weeks ago

Thanks. I'm running a test on 2.4.3 today.

paudano commented 2 weeks ago

I was able to replicate and fix the issue. I have just released 2.4.4.

riyasj327 commented 2 weeks ago

I was facing the exact same issue. Let me try with the new version. Thanks.

riyasj327 commented 2 weeks ago

@paudano Simply using library://becklab/pav/pav:latest via singularity should work now? I see that the docker file is also updated with the new version ?

riyasj327 commented 2 weeks ago

I tried running: singularity run --bind "$(pwd):$(pwd)" library://becklab/pav/pav:latest -c 32

I am getting this new error now: Building DAG of jobs... MissingInputException in rule call_callable_regions in file /opt/pav/rules/call.snakefile, line 189: Missing input files for rule call_callable_regions: output: results/GM24385-1-second/callable/callable_regions_h1_500.bed.gz wildcards: asm_name=GM24385-1-second, hap=h1, flank=500 affected files: results/GM24385-1-second/align/trim-tigref/aligned_query_h1.bed.gz temp/GM24385-1-second/lgsv/sv_ins_h1.bed.gz temp/GM24385-1-second/lgsv/sv_del_h1.bed.gz

paudano commented 2 weeks ago

It's an issue on the container build machine, there's something I should have cleared and did not. I'm rebuilding it now. I'll try running the built container before releasing.

riyasj327 commented 2 weeks ago

thanks @paudano! Can you please let me know here when you are done with it?

paudano commented 2 weeks ago

Test is running now. The error is gone, but I want to make sure there are no more before releasing again.

paudano commented 1 week ago

Version 2.4.5 is up. I am simultaneously managing two versions of PAV while I'm making major improvements (not yet released), and it's breaking some of the processes I have setup to manage things. I'll be more careful about this. Thanks for your patience!

riyasj327 commented 1 week ago

Thanks @paudano for working on this. I see that the error is gone now and I have been able to successfully run it till end using the latest version.

riyasj327 commented 1 week ago

@paudano Just checking if there is a change in the VCF format with the latest version? I just encountered an error on using Truvari bench on the VCF produced by the latest version. This never happened with the VCFs produced by previous PAV version. Any insights on this would be helpful!

Here is the error: [W::bcf_hdr_register_hrec] The type "STR" is not supported, assuming "String" [W::bcf_hdr_register_hrec] The type "STR" is not supported, assuming "String"

[W::vcf_parse_filter] FILTER 'COMPOUND,TRIM' is not defined in the header [E::bcf_hdr_parse_line] Could not parse the header line: "##FILTER=<ID=COMPOUND,TRIM,Description=\"Dummy\">" [E::vcf_parse_filter] Could not add dummy header for FILTER 'COMPOUND,TRIM' at chr1:16725033 Traceback (most recent call last): File "/projects/rsaju_prj/miniconda3/envs/truvari/bin/truvari", line 10, in sys.exit(main()) File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/main.py", line 105, in main TOOLSargs.cmd File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/bench.py", line 759, in bench_main output = m_bench.run() File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/bench.py", line 495, in run for match in itertools.chain.from_iterable(map(self.compare_chunk, chunks)): File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/matching.py", line 312, in chunker for key, entry in file_zipper(*files): File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/matching.py", line 291, in file_zipper markers[first_idx][2] = next(fh) File "/projects/rsaju_prj/miniconda3/envs/truvari/lib/python3.9/site-packages/truvari/region_vcf_iter.py", line 213, in region_filter_stream cur_entry = next(cur_iter) File "pysam/libcbcf.pyx", line 4026, in pysam.libcbcf.TabixIterator.next ValueError: error in vcf_parse

paudano commented 1 week ago

Yes, the VCF headers and FILTER column needs to be fixed. I'll be able to release a fix for this, and you can have PAV regenerate the VCFs with that fix (it won't need to re-run the whole sample). I am traveling today, but I'll get this corrected as soon as I can.

paudano commented 1 week ago

Pushed corrected version 2.4.6.

With the new version, run PAV and add this to the end of the command: -R vcf_write_vcf

That will re-create the VCFs without re-running the whole pipeline. The bad VCFs will be overwritten.

I usually test VCFs by running through bcftools view, which I forgot to do with these changes. It is not generating any errors on my test set now.

Please let me know if you encounter any other issues.

riyasj327 commented 1 week ago

Perfect and thanks @paudano

I just ran - singularity run --bind "$(pwd):$(pwd)" library://becklab/pav/pav:latest -c 32 -R vcf_write_vcf

But seems like it's running from the start? The directory already contains the output files.

INFO: Downloading library image Building DAG of jobs... Using shell: /bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. Job stats: job count


align_depth_bed 2 align_get_read_bed 2 align_get_tig_fa 2 align_map 2 align_trim_tig 2 align_trim_tigref 2 call_callable_regions 2 call_cigar 20 call_cigar_merge 2 call_integrate_fail_fa 6 call_integrate_filter_redundant 8 call_integrate_sources 2 call_intersect_fail 8 call_intersect_fail_batch 160 call_inv_batch 120 call_inv_batch_merge 2 call_inv_cluster 4 call_inv_flag_insdel_cluster 4 call_inv_merge_flagged_loci 2 call_lg_discover 20 call_lg_split 2 call_merge_haplotypes 6 call_merge_haplotypes_batch 160 call_merge_haplotypes_snv 2 call_merge_lg 6 pav_all 1 vcf_write_vcf 1 total 550

Select jobs to execute... Execute 2 jobs...

paudano commented 1 week ago

There might be a step pulling something from temp files, which might need to be re-created. I've already addressed this in the next release so whole pipelines wouldn't need to be re-run. You can add --nt no the command-line to keep temp files until this is resolved, then rm -r temp after it's done. For now, however, you might have to re-run the pipeline.

riyasj327 commented 1 week ago

hmm I actually had the temp folder in the directory while running it still could not pick it up. Anyways, I had re-run now. Thanks.