Rinoahu / SwiftOrtho

A high performance tool to identify orthologs and paralogs across genomes.
GNU General Public License v3.0
27 stars 11 forks source link

run_all.py taking forever #14

Open nitinra opened 1 year ago

nitinra commented 1 year ago

Hello,

I am running swiftortho for 276 insect species. I used the following command

python run_all.py -i allprotein.fa -a 20

I started the run on Jan 30th and it's still running the first step (all-vs-all homology search). Is there anyway I can make it run faster?

Regards, Nitin

Rinoahu commented 1 year ago

How many protein sequences are in the file? Are they the same species?

nitinra commented 1 year ago

Hello,

The insects species span across the entire insect clade, so 276 different species and the # of sequences in my fasta file is ~6839337 sequences (in the combined fasta file).

How do I make it run faster? thank you!

Regards, Nitin

Rinoahu commented 1 year ago
  1. delete the old version

  2. git clone the latest one

  3. install according to the instructions

  4. Run the command: python run_all.py -i allprotein.fa -a 20 -s 11111111 -v 500

  5. You can also try the new tool if the protein sequences have a lot of redundancy python run_all_fast.py -i allprotein.fa -a 20 -s 11111111 -v 500

Generally, increasing the seed length or reducing the number of hit at homolog search can make it run faster