chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
528 stars 86 forks source link

Segfault when incorporating HiC data #606

Open jeramiahsmith opened 7 months ago

jeramiahsmith commented 7 months ago

I ran into a funny issue when assembling a 30+ GB genome using data from an trio-phased interspecific F1. The assembly of HiFi data works fine, but the assembly dies with a segfault when it hits the point of integrating HiC reads. This happens even whether I incorporate the HiC reads as part of the initial run or after completing HiFi assembly, and even when I add only a small number of HiC reads. I am running this on a machine with 4TB RAM and it does not seem to be using all of the RAM when it segfaults (something to do with indexing?). There are ways to resolve this with other programs but I wanted to raise it as a potential issue. This was with version 0.19.6-r595 run through singularity.

it dies like this: /var/spool/slurm/d/job16554653/slurm_script: line 21: 1620105 Segmentation fault singularity run --app hifiasm0196 /share/singularity/images/ccs/conda/amd-conda13-rocky8.sinf hifiasm -o Asm2.asm -t128 -1 ../MATsr.yak -2 ../PATsr.yak --h1 HiC/test_R1.fastq.gz --h2 HiC/test_R2.fastq.gz m84053_230719_210305_s4.hifi_reads.bc2081.fastq.gz m84053_230815_195449_s3.hifi_reads.bc2093.fastq.gz 2> Asm2.hic.asm.trio.log

And here are some run stats generated running this under the time command: User time (seconds): 10113.41 System time (seconds): 581.51 Percent of CPU this job got: 112% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:38:59 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 702009572 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 102 Minor (reclaiming a frame) page faults: 177375771 Voluntary context switches: 40347 Involuntary context switches: 15028 Swaps: 0 File system inputs: 25220 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

chhylp123 commented 7 months ago

@jeramiahsmith Sorry for the late reply since I was too busy during the last few weeks. When running Hi-C phasing, hifiasm in any way needs to build indexes no matter how much Hi-C reads it has. These indexes are pretty large, taking a large amount of RAM. It is hard for me to understand if it is caused by a bug or not. If possible, would you mind sharing the HiFi bin files with me? I only need three bin files for that.