edawson / rkmh

Classify sequencing reads using MinHash.
MIT License
48 stars 4 forks source link

Memory error #13

Open george-githinji opened 5 years ago

george-githinji commented 5 years ago

Running into a memory error after compiling on Linux centos (params - t 4 -k 12 -s 2000

double free or corruption (!prev): 0x00007f64a40008c0

edawson commented 5 years ago

That looks like a memory error, and I have a feeling I know where it got introduced. Which data did you test with? Also, which command did you run? The stream and hpv16 commands are the most robust currently. Call and classify are essentially deprecated, though I am planning to fix them in the late Fall.

I ran the following command on a fresh compile, which seems to work:

./rkmh stream -t 4 -k 12 -s 2000 -r data/zika.refs.fa -f data/zika.fa
gb|KU963574|    gi|226377833|ref|NC_012532.1(ZIKA)| 839 2000

Thanks so much for trying rkmh out and finding some bugs for me to fix!

george-githinji commented 5 years ago

Thank you very much for your response and thanks for a useful tool. I tried this on CHKV virus dataset that I am working on and I would like to remove the human reads contamination and I therefore ran the filter command.

nmb85 commented 1 year ago

I got a similar memory error running the following command: rkmh filter -f /datapool/basespace/skinmg/235258034/Data/Intensities/BaseCalls/sample0031_S339_L001_R1_001.fastq.gz -r ~/beta_hpv.fna -k 21 -s 1000 -t 1

Here is the error msg:

double free or corruption (!prev)
Aborted

Any changes to this code? Note: rkmh stream does indeed work fine on same dataset and reference.

nmb85 commented 1 year ago

A sub-optimal workaround in the meantime: use rkmh stream to extract read headers, then grep out the entire read record.

rkmh stream -t 1 -k 12 -s 2000 -r reference.fna -f sample001_R1.fastq.gz | cut -f 2 | zgrep -A3 -f - sample001_R1.fastq.gz > extracted_reads.fastq