TransDecoder / TransDecoder

TransDecoder source
Other
267 stars 58 forks source link

Integration with ParaFly #1

Closed macmanes closed 9 years ago

macmanes commented 9 years ago

One of the nice things about early TransDecoder versions was it's integration with ParaFly - which made running large numbers of hmmscan jobs on a workstation very easy and fast. The new TransD lacks that, which I think is unfortunate. HpcGridRunner is great for people who have that type of resource available, but ParaFly was perfect for those of us with multi-core workstations.

So, what I'm saying is maybe we need a 'prep_for_parafly` utility that would take a multifasta file as input, along with blastp or hmmscan command, and output a file containing the commands that ParaFly could then act upon..

Maybe something like this already exists..

brianjohnhaas commented 9 years ago

Hi Matt,

If you're not going to run it as distributed on a compute farm, I wonder if just running hmmscan with --cpu multithreading will be sufficient, not requiring parafly. Parafly is best used when the system isn't already multithreaded, as it's a cheap way to make it so.

macmanes commented 9 years ago

No way.. I have a 'average' sized dataset that has still running with multithreaded hmmscan for over 48 hours now. The whole TransD pipeline used to take only a fraction of that using parafly. hmmscan is only weekly threaded - so as to say --cpu 1 is roughly equal to -cpu 8 or higher.

brianjohnhaas commented 9 years ago

gotcha. What I'll do is to add an option to HPC GridRunner to just run it using parafly. You can then write a little wrapper to make your TransDecoder installation run with just a single command (again).

On Tue, Feb 3, 2015 at 7:59 AM, Matt MacManes notifications@github.com wrote:

No way.. I have a 'average' sized dataset that has still running with multithreaded hmmscan for over 48 hours now. The whole TransD pipeline used to take only a fraction of that using parafly. hmmscan is only weekly threaded - so as to say --cpu 1 is roughly equal to -cpu 8 or higher.

— Reply to this email directly or view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/1#issuecomment-72645916 .

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

brianjohnhaas commented 9 years ago

I added a --parafly_only parameter to the HPC GridRunner BioIfx/hpc_FASTA_GridRunner.pl script, so now you can just rely on parafly / local parallel processing for both pfam and blast searches.

macmanes commented 9 years ago

awesome! HPC GridRunner is a really sweet utility, especially now with the added functionality!

brianjohnhaas commented 9 years ago

I still need to add it to the basic cmd runner, but it's in the fasta-based one for now. We'll advertise it in the next release.

best,

~brian

On Tue, Feb 3, 2015 at 9:15 AM, Matt MacManes notifications@github.com wrote:

awesome! HPC GridRunner is a really sweet utility, especially now with the added functionality!

— Reply to this email directly or view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/1#issuecomment-72656839 .

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas