GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
MIT License
105 stars 4 forks source link

segmentation fault in computeN50 #15

Closed krcurtis closed 1 month ago

krcurtis commented 3 months ago

Hi,

I've downloaded the code at commit f6755dd, compiled, and I'm getting a segmentation fault. I'm running metaMDBG on some PacBio long reads, with a command line like this: metaMDBG asm --out-dir . --threads 4 --in-hifi sample.fastq.gz

Here's some output from GDB:

... Multi-k pass: 80/83 Multi-k pass: 81/83 Multi-k pass: 82/83 Multi-k pass: 83/83 Removing overlaps and duplication Polishing contigs Purging strain duplication

Program received signal SIGSEGV, Segmentation fault. 0x0000000000441c20 in Utils::computeN50(std::vector<unsigned int, std::allocator >) () (gdb) bt

⁠0 0x0000000000441c20 in Utils::computeN50(std::vector<unsigned int, std::allocator >) ()

⁠1 0x000000000045634c in AssemblyPipeline::execute() ()

⁠2 0x000000000041fa17 in Tool::run(int, char**) ()

⁠3 0x000000000040a801 in main ()

(gdb) info locals No symbol table info available.

Any ideas?

I'm not sure if debug info was turned on for the c++ compiling stage. I can try that and report back. I'm not that familiar with CMake, how should I enable debug symbols?

Thanks!

GaetanBenoitDev commented 3 months ago

Hi,

Thanks for reporting the crash. I have checked my N50 method and it can crash if the contig count is 0. First, can you attach the metaMDBG.log file, so I can try to check for an error in the previous steps.

Thanks

krcurtis commented 3 months ago

Here is the log file: metaMDBG.log

GaetanBenoitDev commented 3 months ago

The logs are not helpful. Can you "ls -lh ./tmp/" and paste the result here? Thanks

krcurtis commented 3 months ago

Here's the listing: tmp-contents.txt

krcurtis commented 3 months ago

Maybe I should run metaMDBG on a test data set used in your paper?

GaetanBenoitDev commented 3 months ago

It should be fixed, you can get the last commit and recompile.

By default, the polisher does not output low-coverage contigs (because it can't correct them), now you should get contigs with at least 2 reads mapped on them.

krcurtis commented 3 months ago

It ran, but I'm not getting any contigs in the final file:

   Run time:                   37min 8sec
   Peak memory:                4.03875 GB
   Assembly length:            0
   Contigs N50:                0
   Nb contigs:                 0
   Nb Contigs (>1Mb):          0
   Nb circular contigs (>1Mb): 0

I'm starting to wondering if there's something about the FASTQ files I'm using that I'll have to check into.

Otherwise, this bug is fixed, I don't get a segmentation fault anymore.

GaetanBenoitDev commented 3 months ago

Note that you can use the uncorrected contigs (for Hifi, they are quite good quality): ./tmp/contigs_uncorrected.fasta.gz

If you have Hifi reads, you can also try the previous metaMDBG release (v0.3), if you want to check is the problem comes from the new v1.0 or from the data.