castcollab / tesserae2

Tesserae2: Fast recombination-aware global and local alignment.
Other
3 stars 0 forks source link

Bugfix: Tesserae parallelization with ThreadPool on lower python versions than 3.8 #15

Closed karljohanw closed 4 years ago

karljohanw commented 4 years ago

Fixes #14.

It turns out, ThreadPool causes tesserae to hang on python versions earlier than 3.8. With this fix, tesseare will skip threading/multiprocessing on lower versions than that.

Also, with fewer target sequences (like two) it seems the pre-/post-processing of the threading could be considered a (minor) bottleneck. It might still be useful for more targets though...

karljohanw commented 4 years ago

I do not understand why "Test with tox" failed in the automated checks of this pull request. "1 file would be reformatted" does not tell me much. @winni2k could you please help out with that one? I cannot see what's wrong with it. I did run through flake8 before commiting.

winni2k commented 4 years ago

Yup, sorry, my bad. I have pushed a commit that outputs a diff of what black would like to have changed. After running black on the code manually (black --diff src) it appears that the code is missing a trailing comma:

$ black --diff --target-version py36 src
--- src/tesserae.py 2020-04-01 11:20:40.788620 +0000
+++ src/tesserae.py 2020-04-01 11:22:00.540081 +0000
@@ -1,7 +1,8 @@
 import numpy as np
 import sys
+
 if sys.version_info >= (3, 8):
     from multiprocessing.pool import ThreadPool as Pool

 # constants
 SMALL = -1e32
@@ -105,11 +106,11 @@

         if sys.version_info < (3, 8) and self.threads > 1:
             print(
                 "WARNING! Lower python version than 3.8 run! "
                 "Will not use multiprocessing!",
-                file=sys.stderr
+                file=sys.stderr,
             )

         if self.mem_limit:
             qlen_sqrt = np.sqrt(self.qlen)
             qsq_down = int(np.floor(qlen_sqrt))
winni2k commented 4 years ago

Given that I'm looking at this section of code anyway, I can't help but notice the print statement. Please don't use print statements because this is library code. Instead use the logging library. Something like this should work.

import logging
logger = logging.getLogger(__name__)

logger.warning(...)

See https://docs.python.org/3.7/library/logging.html

However, I think it might be better to raise an exception if someone asks for more than one thread and the tool can't comply. As a user, I'd want to know about that.