Closed macmanes closed 9 years ago
Hi Matt,
If you're not going to run it as distributed on a compute farm, I wonder if just running hmmscan with --cpu multithreading will be sufficient, not requiring parafly. Parafly is best used when the system isn't already multithreaded, as it's a cheap way to make it so.
No way.. I have a 'average' sized dataset that has still running with multithreaded hmmscan
for over 48 hours now. The whole TransD pipeline used to take only a fraction of that using parafly. hmmscan
is only weekly threaded - so as to say --cpu 1
is roughly equal to -cpu 8
or higher.
gotcha. What I'll do is to add an option to HPC GridRunner to just run it using parafly. You can then write a little wrapper to make your TransDecoder installation run with just a single command (again).
On Tue, Feb 3, 2015 at 7:59 AM, Matt MacManes notifications@github.com wrote:
No way.. I have a 'average' sized dataset that has still running with multithreaded hmmscan for over 48 hours now. The whole TransD pipeline used to take only a fraction of that using parafly. hmmscan is only weekly threaded - so as to say --cpu 1 is roughly equal to -cpu 8 or higher.
— Reply to this email directly or view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/1#issuecomment-72645916 .
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
I added a --parafly_only parameter to the HPC GridRunner BioIfx/hpc_FASTA_GridRunner.pl script, so now you can just rely on parafly / local parallel processing for both pfam and blast searches.
awesome! HPC GridRunner is a really sweet utility, especially now with the added functionality!
I still need to add it to the basic cmd runner, but it's in the fasta-based one for now. We'll advertise it in the next release.
best,
~brian
On Tue, Feb 3, 2015 at 9:15 AM, Matt MacManes notifications@github.com wrote:
awesome! HPC GridRunner is a really sweet utility, especially now with the added functionality!
— Reply to this email directly or view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/1#issuecomment-72656839 .
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
One of the nice things about early TransDecoder versions was it's integration with ParaFly - which made running large numbers of
hmmscan
jobs on a workstation very easy and fast. The new TransD lacks that, which I think is unfortunate. HpcGridRunner is great for people who have that type of resource available, but ParaFly was perfect for those of us with multi-core workstations.So, what I'm saying is maybe we need a 'prep_for_parafly` utility that would take a multifasta file as input, along with blastp or hmmscan command, and output a file containing the commands that ParaFly could then act upon..
Maybe something like this already exists..