dengzac / contig-extender

GNU General Public License v3.0
22 stars 4 forks source link

ExtenderQs #1

Closed DaanJansen94 closed 3 years ago

DaanJansen94 commented 3 years ago

Hi,

I'm running ContExtender on our bioinformatic pipeline after using MetSPAdes and a 1 kb cutoff on the contigs. I have a few questions regarding the tool:

(1) Is it correct that the maximum length of the contig is limited to 100 kb? I saw the following lines a few times already, but I didn't read it in the paper.

Iteration 491 in 36.8048sec, length 99997 Iteration 492 in 36.3498sec, length 100085 Length limit exceeded

Is there a way to go beyond 100 kb?

(2) I am using fastq files with ~ 20 M PE reads & a fasta file with a few hundred contigs (ranging from 1 kb - 150kb). If I run the "extender_wrapper" with the option "--enable-pair-constraint" the job always crashes, but if I run it without "--enable-pair-constraint" it does seems to work. I obtain the following error:

/var/spool/torque/mom_priv/jobs/50666468.tier2-p-moab-2.tier2.hpc.kuleuven.be.SC: line 26: 80356 Aborted (core dumped)

My guess is that this is a memory problem, since I can run the "--enable-pair-constraint" on only a few small contigs. Is it possible that is isn't scalable to this input, or can I twist it in a way that it would work?

(3) The tool takes a lot of time to run, especially with this amount of reads & contigs, could I twist it in a way to speed it up but still retain accuracy?

Thanks for the help!

Kind regards,

Daan

dengzac commented 3 years ago

Hi Daan,

(1) Yes, there is currently a hardcoded limit of 100kb because in our experiments, contigs that reached this length were likely to be inaccurate. I will make this value user-configurable.

(2) Currently, in paired mode, bowtie is used on the entire contig instead of just its edges, which causes memory to be proportional to contig length. I will look into using the --maxins parameter to reduce the amount of contig that is aligned.

(3) Usually most of the time is spent waiting for bowtie alignment results, so I don't see any easy optimizations at the moment.

DaanJansen94 commented 3 years ago

Hey Zachary,

I'm not sure how feasible it is for ContigExtender, but if the alignments are the limiting step perhaps BWA-MEM2 on multi-core system could improve the speed quite a lot.

https://ieeexplore.ieee.org/document/8820962 https://www.biorxiv.org/content/10.1101/053686v1.full.pdf

Anyway, thanks for the information & updates.

Cheers,

Daan