broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
187 stars 67 forks source link

Toolify V-Phaser2 #87

Closed dpark01 closed 9 years ago

dpark01 commented 9 years ago

The V-Phaser2 iSNV caller (written by Xiao, Broad Viral Genomics) belongs to us, so we can just incorporate the relevant C code into this project, make sure the Makefile works more generally, and wrap it all up in a new Tool class.

This program takes a BAM file in (representing aligned reads) and spits out a custom-formatted text file that describes all the intra-host variation in this sample. It is one of the few such tools that calls indels as well.

The resulting output is pretty permissive--subsequent steps will need to filter the calls based on various criteria, and the coordinates will also need to be remapped in most cases.

iljungr commented 9 years ago

I am finally able to build V-Phaser 2 on both mac and linux, but there are some issues and we'll need to decide if we should a) distribute binaries, or b) download and build from source.

The big problem is that V-Phaser 2 requires bamtools libraries at run time. Option (a) would be much easier, particularly on mac, but it requires redistributing the bamtools libraries. We would only need to distribute the lib directory, which has .a and .dylib files on mac, and .a and .so files on linux. We don't need to redistribute the source code or the final bamtools executable. Bamtools is licensed under the MIT license: https://github.com/pezmaster31/bamtools/blob/master/LICENSE. It says that we are free to distribute provided we include the copyright and permission notice.

Downloading and building bamtools from source is doable, though it requires downloading cmake, which is also doable. The place things get hairy is that V-Phaser 2 can fail if the version of bamtools with which it was build it different from the version with which it is run. Worse, it can fail if the compiler used to build bamtools is different from the compiler used to build V-Phaser 2. (I suspect it has to do with incompatible name mangling.) That means that if we download and build bamtools then we need to download and build V-Phaser 2 also, rather than just distributing the executable. That requires downloading "boost" and editing the V-Phaser 2 make file, which are also doable.

All of that is doable on linux, though it will take a bit of doing. Things get more complicated on Mac. V-Phaser 2 does not build using the default compiler on my mac (OS X 10.9.5). I am able to build if I do "brew install gcc" and use that compiler (though I also need to add an include to one of the .cpp files). I don't think our scripts should be brew installing gcc into our user's /usr/local/bin, since the user might have some other version of the compiler there. I can probably download the compiler to some local area and use that, but I haven't tried it yet.

So, for Mac at least, I'd rather just distribute the V-Phaser 2 executable that I've built on my machine. Unfortunately, that means that we need to use bamtools libraries built with the same compiler, so we'd have to distribute those too.

For now, I'm going to go with option a, including the bamtools libraries. If we need to go with option b instead, it will be some work but is definitely doable on linux, and probably on Mac.

Irwin

On Feb 10, 2015, at 9:57 AM, Daniel Park notifications@github.com wrote:

Assigned #87 to @iljungr.

— Reply to this email directly or view it on GitHub.

dpark01 commented 9 years ago

I think distributing V-Phaser binaries sounds good to me (we can consider source compile as an add-on later). Can you statically link the bamtools library into it?

iljungr commented 9 years ago

Statically linking sounds like a good idea, but bamtools is built using CMake. Do you know how to force CMake to make static libraries? If not, I'll need to do some research.

On Feb 13, 2015, at 1:18 PM, Daniel Park notifications@github.com wrote:

I think distributing V-Phaser binaries sounds good to me (we can consider source compile as an add-on later). Can you statically link the bamtools library into it?

— Reply to this email directly or view it on GitHub.

dpark01 commented 9 years ago

So, yeah then just go with including the bamtools library for now.. Then maybe later we can figure out an auto download or whatever. I think in general my inclination is to make it work first, and if deemed necessary, improve it later.

iljungr commented 9 years ago

Turns out the dynamic build had a problem that prevented it from working on travis, namely that it depended dynamically on libraries referenced by bamtools that aren't present on travis, or at least not in the same place. Presumably this would have also prevented it from working on our end-users' computers.

After much hair-pulling, I was able to get a static link on both mac and linux, so it runs on travis and there is no longer a need to distribute anyone else's libraries. Details on how to build the mac and linux executables are in the comments of vphaser.py

I created a unit test using the test file that comes with vphaser and comparing some of the output files to my initial run. However, in some cases some of the p-values vary randomly, so the file comparison didn't work. I fixed that temporarily by only comparing files that don't contain p-values. I will file a separate issue on finding out why the p-values are unstable.

We might want to create a faster unit test -- this one takes almost a minute on my mac.

On Feb 14, 2015, at 2:24 PM, Daniel Park notifications@github.com wrote:

So, yeah then just go with including the bamtools library for now.. Then maybe later we can figure out an auto download or whatever. I think in general my inclination is to make it work first, and if deemed necessary, improve it later. — Reply to this email directly or view it on GitHub.