Segmentation fault occurred when outputting recorrected.fa

zengxiaofei commented 3 months ago

Hi Yichen,

I'm using DeChat to correct ONT ultra-long reads (R10.4.1). However, I encountered a segmentation fault while the program was generating the recorrected.fa file.

/opt/gridview/slurm/spool/slurmd/job15326495/slurm_script: line 12: 12367 Segmentation fault      dechat -i all_UL_50k.fa.gz -o reads -t 32

My collaborator also faced the same problem with another dataset.

Best, Xiaofei

LuoGroup2023 commented 3 months ago

Hi Yichen,

I'm using DeChat to correct ONT ultra-long reads (R10.4.1). However, I encountered a segmentation fault while the program was generating the recorrected.fa file.
/opt/gridview/slurm/spool/slurmd/job15326495/slurm_script: line 12: 12367 Segmentation fault      dechat -i all_UL_50k.fa.gz -o reads -t 32
My collaborator also faced the same problem with another dataset.

Best, Xiaofei

Hello Xiaofei,

can you provide a detailed output log to allow me to resolve the issue you are experiencing more quickly? I haven't encountered any segmentation fault error issues in stage1 during my previous testing, so I need to identify the possible problematic segments based on the output logs.

Best, Yichen

zengxiaofei commented 3 months ago

slurm-15326495.out.gz

Hi Yichen, I have attached the logs, FYI.

LuoGroup2023 commented 3 months ago

slurm-15326495.out.gz

Hi Yichen, I have attached the logs, FYI.

Hello, Zeng Xiaofei,

Based on the output log you provided, I checked the corresponding code section where the error occurred. I found that the error happens during the process of reloading ONT data and writing the corresponding corrected data to recorrected.fa after constructing the dBG.

I tested a sample dataset using your file naming conventions and ruled out file naming issues.

At line 1829 of your output log, the memory usage is displayed, and at line 1845 of your output log, the size of your server's memory is shown.

Further investigation revealed that your system's memory is approximately 249.6GB, as shown in the output log. The data you're testing occupies around 65071 MB during the graph construction process. It suggests that your fq file is around 200G or even larger, but your system memory is only 249.6GB, and it may also be running other processes simultaneously.

In summary, the primary cause of the error seems to be insufficient system memory. Additionally, another possible cause could be a permission issue preventing data from being written to recorrected.fa. Therefore, I suggest extracting a smaller portion of your data for testing to determine if the error is due to insufficient memory.

I hope this helps.

Best regards.

zengxiaofei commented 3 months ago

Thank you for your help! I'll try it out!

zengxiaofei commented 3 months ago

Hi Yichen,

DeChat crushed again with the same error. I selected the ultra-long reads with a minimum length of 100 Kb (~140 Gb data) for correction. This task was performed alone on a server with more than 240 GB of RAM. The logs of the job scheduler show that the peak memory was less than 60 GB. Should I test it on a server with higher RAM capacity?

file               format  type  num_seqs          sum_len  min_len    avg_len  max_len
all_UL_100k.fa.gz  FASTA   DNA    838,600  139,461,799,783  100,000  166,303.1  989,459

微信截图_20240519094011

slurm-15366402.out.gz

LuoGroup2023 commented 3 months ago

Hi Yichen,

DeChat crushed again with the same error. I selected the ultra-long reads with a minimum length of 100 Kb (~140 Gb data) for correction. This task was performed alone on a server with more than 240 GB of RAM. The logs of the job scheduler show that the peak memory was less than 60 GB. Should I test it on a server with higher RAM capacity?
file               format  type  num_seqs          sum_len  min_len    avg_len  max_len
all_UL_100k.fa.gz  FASTA   DNA    838,600  139,461,799,783  100,000  166,303.1  989,459
slurm-15366402.out.gz

Hello,Zeng Xiaofei,

I apologize for the late reply; I have been occupied with other matters recently.

Regarding your issue, could you upload the dataset and send it to my email, or provide me with more detailed information about your dataset via email?

I tested the ERR7256374 ont.10.4 dataset and found that dechat reported the same error at the same position with the following messages: malloc(): unaligned tcache chunk detected and malloc(): invalid size (unsorted). However, after filtering out sequences shorter than 1000 bp, dechat was able to run the entire process successfully. I noticed that you mentioned your dataset has a minimum length of 100 Kb, so I'm not sure if this solution will work for you.

I am working hard to resolve this issue, which is caused by other functions being called.

Could you provide me with the following information to help determine if this is the same issue or a new one:

Does your dataset contain very short sequences (less than 1000 bp)? After the error occurs, does the recorrected.fa file already contain some corrected sequences, i.e., does it have a certain size? My email is yichenli@hnu.edu.cn. I hope to help you resolve the issue you are facing!

Best regards.

LuoGroup2023 commented 2 months ago

You can now install the latest version 1.0.1 via Conda, which fixes the issue that may occur in generating recorrected.fa previously.

LuoGroup2023 / DeChat

Segmentation fault occurred when outputting recorrected.fa #3