lumpy segfault in lumpyexpress & work around

I've been trying to implement my prokaryotic variants pipeline on galaxy, but I ran into a problem: the version of lumpy on galaxy (0.2.14a) errors out on some samples, complaining about missing mean & stdev values. According to galaxy, the literal strings "$MEAN" and "$STDEV" were being passed to lumpy.

Locally, on my mac (with version 0.2.13 installed via homebrew) however, I did not have this issue and was getting results without error. I thought maybe lumpyexpress was handling things better than the lumpy galaxy wrapper, so I'd written my own lumpyexpress wrapper. However, it had an issue. $HEXDUMP was not set and I was getting errors about commands "-n" and "==" not being found - and then a segfault. So I decided to try downgrading to 0.2.13 via conda on galaxy (but that didn't work - I don't remember why, regardless, it was still failing). So then I installed the latest version of lumpy on my mac using conda - and that was segfaulting also.

I turned on $VERBOSE and discovered that all the samples that were failing in the original lumpy wrapper on galaxy were not being supplied their discordants files. When I commented out the conditional that was removing then, I got back to the same errors I was getting from the original lumpy galaxy wrapper: mean & stdev were missing. The VERBOSE output showed me that they were being supplied the strings "NA", which is what the conditional was checking. The old lumpyexpress script from 0.2.13 did not have that check and DID have real values for mean & stdev. So I tried running the newest version of the lumpy call made by lumpyexpress, supplying the mean & stdev values I got from the old lumpyexpress verbose output and it worked! No error about missing mean & stdev values and no segfault - plus I got results.

There was also another difference between the old & new versions of lumpyexpress. The new version issues warnings for all the 10 samples that fail (and are not supplied with -pe flags), complaining that there were fewer than 1000 "elements" during the calculation of the insert size. The warning was coming from the script whose output is parsed to get the mean & stdev values.

So I think that the work-around to avoid the segfault and the errors and get results is to revert that script to the old version. Keep the warning about fewer than 1000 elements, but output a mean & stdev value so that lumpy can finish. I assume that that change was made intentionally, but the handling downstream didn't account for the possibility that those discordant files would be missing.

I can provide small bam files that demonstrate this issue if necessary.

I submitted https://github.com/arq5x/lumpy-sv/pull/277 which is a work-around of the segfault issue (by allowing the insert mean & stdev to be calculated despite the 1000 element threshold). It still issues the warning about the threshold, but calculates these numbers anyway, because without them, the lumpy calls either error out and produce no result (when the discordants are supplied without settings for these values) or segfault when the -pe options for the affected discordants files are omitted from the lumpy call. Note, this affects lumpyexpress (which segfaults) and the lumpy galaxy wrapper (which errors out).

3 of the the tests which all appear to be on lumpy_smooth, fail. However, I could not get them to succeed on the repo before my changes either...

arq5x / lumpy-sv

lumpy segfault in lumpyexpress & work around #276