Open peterdfields opened 7 years ago
I'll upload a sample as soon as possible.
Btw... have you noticed the change in parameters from CGAL to SWALO? bowtie2convert for example needs to be called with 3 parameters in SWALO.
bowtie2convert <mapFile_SAM> <contigFile_Fasta> <maxInsertSize>
Hi @atifrahman
Thanks for looking into a test dataset. I did notice the difference but I was more interested in if the problem might be related to my sam file vs. the bowtie2convert executable. Given the CGAL version worked I assumed that maybe the issue was due to the bowtie2convert I built from SWALO but that's what I'm not entirely sure about.
I am also seeing the segfault with bowtie2convert
. On my ubuntu box this manifests as a "stack smashing" error, which I think is caused by a buffer overflow. I looked at the code quickly and there are lots of places where there could be a buffer overflow.
Note that a segfault is always a bug. It doesn't matter what the input is like, this shouldn't happen.
Is there any reason to do the SAM file parsing yourself and not use htslib (which also gives you BAM parsing for free)? I notice also that the parsing code is duplicated between bowtieconvert and bowtie2convert. This seems unnecessary.
Hi,
@peterdfields, sorry for taking so long!
@georgek, some parts were written long time ago and included with CGAL and bowtieconvert was later edited into bowtie2convert. Not great practices I know but hopefully things can be fixed.
I have increased some buffer lengths. Would you guys mind trying again with the updated versions and let me know how things go?
It works now, but there are still buffer overflows in the code. Why not actually check the input data and make sure you don't overflow? I guess in the name of speed you can just say "max read length is 200 otherwise program will crash", but it's not like most users of the program are going to know what it a buffer is, much less know how to update the code to make it work for them.
In my case my reads were 300bp long (from a MiSeq), so it overflowed the 200 char buffer.
The update also works for me as well. I'm using MiSeq reads that are 250bp. However, while bowtie2convert and align work (or at least without throwing an error), swalo itself is now throwing a segmentation fault. Specifically, I see a report of a large number, such as:
8353828
followed by:
Segmentation fault (core dumped)
Should I open another issue for this?
Sorry about the bugs! I need to test it on MiSeq data it seems. Is the data you're using accessible?
The data I'm using is not at present published. Are there any particular files that would be most useful for the troubleshooting? or any intermediate outputs?r
Could you send me the unmappedOut.sam
file at atif.bd@gmail.com?
Not sure what's going wrong.
make
within the swalo directory executes without error. Tried bowtie2convert script from cgal which works fine on sam file but then again not entirely sure that they share the same functionality.Could you provide a simple sample to execute the basic functionality of the scripts in order to test the installation vs. data problems? Or is there a way of generating more verbose output for diagnosing issues?