EichlerLab / pav

Phased assembly variant caller
98 stars 8 forks source link

"Segmentation fault" and "core dump" at align_get_read_bed #4

Closed chycheng2015 closed 1 year ago

chycheng2015 commented 3 years ago

Hi, I tried to run pav pipeline, and it seemed to work until Minimap2 mapping, but ended up at align_get_read_bed with "Segmentation fault (core dump)" . The followings are the messages. Have you any idea what happened in my run?

Best, Cheng

count   jobs
1   align_get_read_bed
1

[M::worker_pipeline::15.952*1.42] mapped 1 sequences [M::main] Version: 2.17-r974-dirty [M::main] CMD: minimap2 -x asm20 -m 10000 -z 10000,50 -r 50000 --end-bonus=100 --secondary=no -a -t 12 --eqx -Y -O 5,56 -E 4,1 -B 5 data/ref/ref.fa.gz temp/hap1hap2_fasta/align/contigs_h1.fa.gz [M::main] Real time: 16.106 sec; CPU: 22.832 sec; Peak RSS: 2.493 GB [Fri Jun 18 10:34:30 2021] Finished job 8. 6 of 211 steps (3%) done Select jobs to execute...

[Fri Jun 18 10:34:30 2021] rule align_get_read_bed: input: temp/hap1hap2_fasta/align/pre-cut/aligned_tig_h1.sam.gz, temp/hap1hap2_fasta/align/contigs_h1.fa.gz.fai output: results/hap1hap2_fasta/align/pre-cut/aligned_tig_h1.bed.gz, results/hap1hap2_fasta/align/pre-cut/aligned_tig_h1.headers.gz jobid: 7 wildcards: asm_name=hap1hap2_fasta, hap=h1

Job counts: count jobs 1 align_get_read_bed 1 /bin/sh: 13 行: 205680 Segmentation fault (core dumped) /home/cheng/.conda/envs/snakemake/bin/python3.9 -m snakemake results/hap1hap2_fasta/align/pre-cut/aligned_tig_h1.bed.gz --snakefile /home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/Snakefile --force -j20 --keep-target-files --keep-remote --attempt 1 --scheduler ilp --force-use-threads --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --max-inventory-time 0 --ignore-incomplete --latency-wait 5 --allowed-rules align_get_read_bed --notemp --quiet --no-hooks --nolock --mode 1 [Fri Jun 18 10:34:40 2021] Finished job 23. 7 of 211 steps (3%) done /bin/sh: 13 行: 205612 Segmentation fault (core dumped) /home/cheng/.conda/envs/snakemake/bin/python3.9 -m snakemake results/hap1hap2_fasta/align/pre-cut/aligned_tig_h2.bed.gz --snakefile /home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/Snakefile --force -j20 --keep-target-files --keep-remote --attempt 1 --scheduler ilp --force-use-threads --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --max-inventory-time 0 --ignore-incomplete --latency-wait 5 --allowed-rules align_get_read_bed --notemp --quiet --no-hooks --nolock --mode 1 Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/cheng/Documents/Temp_bioinfo_tools/pav_practice/.snakemake/log/2021-06-18T103255.622958.snakemake.log

paudano commented 3 years ago

I have had problems with pysam 0.16 generating segementation faults and the step it is on uses pysam. I am currently using pysam 0.15.3 now, and recently used 0.15.2 in another conda environment, and both were stable.

What version of pysam are you running? You can find it with conda list pysam from your conda environment (after activating the conda environment) or pysam.__version__ in a python shell (after import pysam).

chycheng2015 commented 3 years ago

Thank you very much for replying me!

After reinstalling a pysam 0.15.2 as you suggested, the pipeline went a lot farther, but it ended up on another error as followings.

The other python libraries I was using are: python 3.8, biopython 1.78, pysam 0.15.2, matplotlib 3.3.4, scipy 1.62, numpy 1.20.3, pandas 1.2.4

Thank you.

Cheng

Finished job 131. 122 of 211 steps (58%) done Job counts: count jobs 1 call_inv_flag_insdel_cluster 1 [Sat Jun 19 08:35:58 2021] Error in rule call_inv_flag_insdel_cluster: jobid: 0 output: temp/hap1hap2_fasta/inv_caller/flag/insdel_sv_h1.bed.gz

RuleException: ValueError in line 590 of /home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/rules/call_inv.snakefile: No objects to concatenate File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 2357, in run_wrapper File "/home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/rules/call_inv.snakefile", line 590, in rule_call_inv_flag_insdel_cluster File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 342, in init File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 574, in _callback File "/home/cheng/.conda/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 560, in cached_or_run File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init__.py", line 2390, in run_wrapper Exiting because a job execution failed. Look above for error message [Sat Jun 19 08:35:58 2021] Error in rule call_inv_flag_insdel_cluster: jobid: 0 output: temp/hap1hap2_fasta/inv_caller/flag/insdel_indel_h1.bed.gz

RuleException: ValueError in line 590 of /home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/rules/call_inv.snakefile: No objects to concatenate File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 2357, in run_wrapper File "/home/cheng/Documents/Temp_bioinfo_tools/PAV/pav/rules/call_inv.snakefile", line 590, in rule_call_inv_flag_insdel_cluster File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 342, in init File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 574, in _callback File "/home/cheng/.conda/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init.py", line 560, in cached_or_run File "/home/cheng/.conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/init__.py", line 2390, in run_wrapper Exiting because a job execution failed. Look above for error message

paudano commented 3 years ago

It's crashing on a step that uses indel clusters to search for evidence of inversions that are too small to break alignments. I should fix this so it doesn't crash the pipeline, but if it's not finding indels at this stage, something must be wrong. Are you running a whole genome assembly? If you can share your config JSON and results/hap1hap2_fasta/align/aligned_tig_h1.bed.gz, I might be able to see what's going on.

chycheng2015 commented 3 years ago

I also realized later that the crashing was due to that no indel cluster was generated, because I used a mini sample data. After I used a whole assembly data, the pipeline worked properly. Thanks a lot for your patience to answer my question. Cheng

paudano commented 3 years ago

I am going to leave it open for now until I fix the bug, PAV should run smaller assemblies without crashing. Thank you for letting me know!

paudano commented 1 year ago

I tested,a nd PAV runs smaller examples now. I included a small example for PAV, which is now part of 2.2.0.