deprekate / PHACTS

A simple program to classify the lifestyle of phages.
GNU General Public License v3.0
16 stars 4 forks source link

How could I use PHACTS to predict the lifecycle of thousands of phages? #1

Open loukesio opened 3 years ago

loukesio commented 3 years ago

Thank you for the nice tool! I would like use PHACTS to predict the life cycle of thousands of phages. Do you have any idea how could i do it? Could I use it in our local cluter?

Thank you for your time :)

deprekate commented 3 years ago

I haven't gotten around to converting the old perl code into nice user friendly python. But the old tarball at: https://edwards.sdsu.edu/PHACTS/PHACTS-0.3.tar.gz should be self contained. All you would need to do is download and compile FASTA36 (https://fasta.bioch.virginia.edu/fasta_www2/fasta_down.shtml)

and then edit line 37 of phacts.pl so it points to your fasta36 install

#------------------------------------------------------------------------------
# This is the path to your FASTA35 install 
my $fasta_path = "/home3/katelyn/opt/PHACTS/fasta-36.3.8e/bin/fasta36";
#------------------------------------------------------------------------------
deprekate commented 1 year ago

As far as running phacts on hundreds of phages. Normally I run my big jobs on clusters that have hundreds of cores.

However you should be able to do the same on a regular computer. The easiest would be to use the linux command xargs. The syntax to run on the two test genomes with one command would be:

$ pip install phacts
$ ls tests/ | xargs -I{} phacts.py tests/{} -o {}.txt

*be careful with the above command, I don't have phacts checking to make sure the file given to the -o does not exist (yet), so it will overwrite any existing file with the same name

The old version of phacts had threading implementation, but not the new one (yet). So when you run a single phacts job, it runs 10 replicates serially instead of parallel, which is why if you bump the -r up to 50, it takes 5 times as long. When you run the above xargs command it will run each job serially, one after the other. It takes about 3 minutes to run the two test genomes through phacts on my laptop. So you could potentially get a few hundred genomes through using that command, in about a day.

If you have thousands of genomes (or want to bump up the -r replicates), instead of xargs you can use the parallel command, which will use multiple cores, so if you have 8 cores, 8 jobs will run at once. When a job finishes, a new job will be sent to that core to run. This will allow you to get thousands of genomes through, in a reasonable amount of time.
The syntax for parallel is very similiar to xargs:

$ ls tests/ | parallel -I{} phacts.py tests/{} -o {}.txt