broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
187 stars 67 forks source link

Trinity needs to prepend PATH with conda bin path #267

Closed yesimon closed 8 years ago

yesimon commented 8 years ago

Otherwise it picks up the system java version which may be newer and fail to run

dpark01 commented 8 years ago

Wait, does the conda version of Trinity not include our fixes that allow it to work fine on Java 1.7 and 1.8?

On Apr 20, 2016, at 2:52 AM, Simon Ye notifications@github.com wrote:

Otherwise it picks up the system java version which may be newer and fail to run

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

yesimon commented 8 years ago

Thanks for noticing! @tomkinsc Looks like the patch is not patching out the die part of the java version check

tomkinsc commented 8 years ago

I pulled the latest Trinity v1 package from anaconda, and to me it looks like the patch is being applied. The changes to lines 211-212 look identical to those created by our prior DownloadAndBuildTrinity.

tomkinsc commented 8 years ago

For conda-based installs to work in the first place, users will need to have conda on their PATH, which also means that the conda install of java will be picked up.

tomkinsc commented 8 years ago

So I think the issue may be that conda installs the openjdk flavor of java, rather than oracle java, and the version string being checked by Trinity (even after being patched) doesn't match when openjdk is being used. I'll change the patch to add openjdk as an alternate match

tomkinsc commented 8 years ago

PR pending with fix: https://github.com/bioconda/bioconda-recipes/pull/1352

dpark01 commented 8 years ago

Great, once that's in, go ahead and PR viral-ngs accordingly. Also, once that's done, is there any reason that Trinity needs to live in its own conda environment, or can it go back to living with everyone else ("default")?

yesimon commented 8 years ago

No the problem is the conflict between perl and perl-threaded, which is not as easily resolved.

On Wed, Apr 20, 2016 at 11:59 AM Daniel Park notifications@github.com wrote:

Great, once that's in, go ahead and PR viral-ngs accordingly. Also, once that's done, is there any reason that Trinity needs to live in its own conda environment, or can it go back to living with everyone else ("default")?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/broadinstitute/viral-ngs/issues/267#issuecomment-212489641

tomkinsc commented 8 years ago

No PR should be needed, since viral-ngs will pull the latest build when installing Trinity. What's the specific issue with perl vs perl-threaded?

yesimon commented 8 years ago

Trinity requires perl-threaded while krona requires perl. Depending on the order they get installed, trinity might find a perl that won't work. That's why it was split into its own environment.

tomkinsc commented 8 years ago

Ok. Why can't krona use perl-threaded?

yesimon commented 8 years ago

Because it has a dep on perl. Apparently perl-threaded has a significant performance penalty due to synchronization overhead. Therefore packages prefer to dep on perl if possible.

tomkinsc commented 8 years ago

Krona currently has a dependency on perl. It can and should be changed to use perl-threaded, since it seems the performance argument may not hold water, and the official bioconda guidelines require the use of perl-threaded. Single-threaded perl may be a bit faster, but it's not worth it if it causes compatibility issues.

tomkinsc commented 8 years ago

I've put in a PR (https://github.com/bioconda/bioconda-recipes/pull/1357) to move krona to use perl-threaded. Trinity can join the other packages in the default conda env once it is accepted, right?

yesimon commented 8 years ago

Yes it can go in the default env after that's through. Although isn't that link pretty definitive that threads decrease performance? The question is whether anybody cares about 5%, and if you're using conda bin packages, you definitely aren't concerned about performance :wink:

dpark01 commented 8 years ago

Krona has got to be almost instantaneous to run, right? Please tell me average runtime is under 20sec...

Daniel J. Park, PhD Group Leader, Viral Computational Genomics Broad Institute of MIT and Harvard Tel: +1-617-714-8526 dpark@broadinstitute.org http://www.broadinstitute.org/bios/daniel-park

On Apr 20, 2016, at 3:58 PM, Simon Ye notifications@github.com wrote:

Yes it can go in the default env after that's through. Although isn't that link pretty definitive that threads decrease performance? The question is whether anybody cares about 5%, and if you're using conda bin packages, you definitely aren't concerned about performance

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

yesimon commented 8 years ago

I meant it as a general comment about the conda+perl interaction. Krona has instant runtime once the db is created.

tomkinsc commented 8 years ago

OK, krona in bioconda now uses perl-threaded.

dpark01 commented 8 years ago

Wait, what db? metagenomics.krona(...) and tools.krona.Krona.import_taxonomy(...) make no mention of a db. As far as I can tell, the input files and output files are all small O(1) tiny things.

yesimon commented 8 years ago

It needs a taxonomy database, which it stores and detects in its own opt directory by default. I'll make it configurable along with the krona unit test.