atifrahman / SWALO

Scaffolding with assembly likelihood optimization
GNU General Public License v3.0
20 stars 3 forks source link

bowtie2convert segmentation fault #1

Open peterdfields opened 7 years ago

peterdfields commented 7 years ago

Not sure what's going wrong. make within the swalo directory executes without error. Tried bowtie2convert script from cgal which works fine on sam file but then again not entirely sure that they share the same functionality.

Could you provide a simple sample to execute the basic functionality of the scripts in order to test the installation vs. data problems? Or is there a way of generating more verbose output for diagnosing issues?

atifrahman commented 7 years ago

I'll upload a sample as soon as possible.

Btw... have you noticed the change in parameters from CGAL to SWALO? bowtie2convert for example needs to be called with 3 parameters in SWALO. bowtie2convert <mapFile_SAM> <contigFile_Fasta> <maxInsertSize>

peterdfields commented 7 years ago

Hi @atifrahman

Thanks for looking into a test dataset. I did notice the difference but I was more interested in if the problem might be related to my sam file vs. the bowtie2convert executable. Given the CGAL version worked I assumed that maybe the issue was due to the bowtie2convert I built from SWALO but that's what I'm not entirely sure about.

georgek commented 7 years ago

I am also seeing the segfault with bowtie2convert. On my ubuntu box this manifests as a "stack smashing" error, which I think is caused by a buffer overflow. I looked at the code quickly and there are lots of places where there could be a buffer overflow.

Note that a segfault is always a bug. It doesn't matter what the input is like, this shouldn't happen.

Is there any reason to do the SAM file parsing yourself and not use htslib (which also gives you BAM parsing for free)? I notice also that the parsing code is duplicated between bowtieconvert and bowtie2convert. This seems unnecessary.

atifrahman commented 7 years ago

Hi,

@peterdfields, sorry for taking so long!

@georgek, some parts were written long time ago and included with CGAL and bowtieconvert was later edited into bowtie2convert. Not great practices I know but hopefully things can be fixed.

I have increased some buffer lengths. Would you guys mind trying again with the updated versions and let me know how things go?

georgek commented 7 years ago

It works now, but there are still buffer overflows in the code. Why not actually check the input data and make sure you don't overflow? I guess in the name of speed you can just say "max read length is 200 otherwise program will crash", but it's not like most users of the program are going to know what it a buffer is, much less know how to update the code to make it work for them.

In my case my reads were 300bp long (from a MiSeq), so it overflowed the 200 char buffer.

peterdfields commented 7 years ago

The update also works for me as well. I'm using MiSeq reads that are 250bp. However, while bowtie2convert and align work (or at least without throwing an error), swalo itself is now throwing a segmentation fault. Specifically, I see a report of a large number, such as:

8353828

followed by:

Segmentation fault (core dumped)

Should I open another issue for this?

atifrahman commented 7 years ago

Sorry about the bugs! I need to test it on MiSeq data it seems. Is the data you're using accessible?

peterdfields commented 7 years ago

The data I'm using is not at present published. Are there any particular files that would be most useful for the troubleshooting? or any intermediate outputs?r

atifrahman commented 7 years ago

Could you send me the unmappedOut.sam file at atif.bd@gmail.com?