carolzhou / multiPhATE2

multiPhATE with comparative genomics
18 stars 10 forks source link

Parallelism problem #5

Closed vihoikka closed 3 years ago

vihoikka commented 3 years ago

Hi,

First off, thank you for the really nice pipeline 😊.

I'm having some trouble with multithreading. Setting phate_threads='ALL' results in the following error:

Traceback (most recent call last): File "multiPhate.py", line 2083, in <module>. if int(phateThreads) == 0:\ ValueError: invalid literal for int() with base 10: 'ALL'

It seems to expecting an integer instead of a string. When I insert an integer, say 4 (on a six-core Macbook Pro 2019 16") it still runs on a single thread:

multiPhate says, Using 1 threads

Not a huge issue, since a single thread still finishes in a reasonable time on a typical phage genome.

I'm running a .config with phanotate, ncbi_virus_genome_blast, blastp, hmmscan and jackhmmer enabled (no CGP)

cheers, Ville Hoikkala

jeffkimbrel commented 3 years ago

Hi Ville,

Thanks for pointing that out, it is indeed a bug that needs fixing.

To get around it for now, the code should accept specifying an actual number of threads, eg using phate_threads='8' instead of phate_threads='ALL'. Hopefully that will work for you in the meantime until we get this code updated.

Jeff

vihoikka commented 3 years ago

Hi Jeff,

thanks for the response! Unfortunately the program apparently runs with only one thread regardless of the number specified in phate_threads: multiPhate says, Using 1 threads (when phate_threads='4')

Ville

jeffkimbrel commented 3 years ago

Hi Ville,

Hmm, perhaps the problem is bigger than I thought. Thanks for reporting, we will look in to it. Can you list your environment to help us, specifically your OS and python version?

Thanks,

Jeff

linsalrob commented 3 years ago

With the fixes implemented in PR #6 multiPhATE2 runs in parallel if the config is defined as phate_threads='ALL'

multiPhate says, Using 3 threads

But I only have three genomes in my test set

carolzhou commented 3 years ago

Hi Vihoikka, How many genomes are you running through multiPhATE? The max number of phate threads will be = the number of genomes or the number of your system's CPUs, whichever is fewer.

vihoikka commented 3 years ago

Ah, I'm only running one genome, so that is the reason. I mistakenly assumed multithreading even in the case of only one genome.

I see that phate_threads='ALL' works now. Thank you for the swift responses!

carolzhou commented 3 years ago

That's right. One processor is assigned per PhATE pipeline run. Similar goes for CGP. If comparing 4 genomes, there will be 4x(4-1)/2 = 6 processors (maximum). Thank you for using multiPhATE2. Please let us know if you encounter any other issues.