dkoslicki / CMash

Fast and accurate set similarity estimation via containment min hash
BSD 3-Clause "New" or "Revised" License
42 stars 9 forks source link

Python3 hanging #12

Closed dkoslicki closed 4 years ago

dkoslicki commented 4 years ago

See this issue with Metalign. In short, So I figured out the probably source of the program hanging: python3 vs python2 for CMash: After setting up the data, this hangs:

python3 -m venv VE3 source VE3/bin/activate .\setup_libraries.sh

This hangs:

python3 metalign.py test/RL_S001insert_270_1M_subset.fq data/ --output test/RL_S001insert_270_1M_subset_results.tsv

this is the thing causing the hang:

python3 StreamingQueryDNADatabase.py ../../data/r7aqo9zw/60mers_intersection_dump.fa ../../data/cmash_db_n1000_k60.h5 ../../test/CMash_out.csv 30-60-10 -c 0 -r 10000 -v -f ../../data/cmash_filter_n1000_k60_30-60-10.bf --sensitive
So instead, try python2, and it doesn't hang:

virtualenv VE2 source VE2/bin/activate cd CMash pip install -r requirements.txt

this runs just fine and does not hang:

python StreamingQueryDNADatabase.py ../../data/r7aqo9zw/60mers_intersection_dump.fa ../../data/cmash_db_n1000_k60.h5 ../../test/CMash_out.csv 30-60-10 -c 0 -r 10000 -v -f ../../data/cmash_filter_n1000_k60_30-60-10.bf --sensitive

this works too (oddly enough, since it's being called with python3, so it only looks like installing CMash with python2 is required):

python3 metalign.py test/RL_S001insert_270_1M_subset.fq data/ --output test/RL_S001insert_270_1M_subset_results.tsv

also works (as it appears metalign.py is python2/3 compliant):

python metalign.py test/RL_S001insert_270_1M_subset.fq data/ --output test/RL_S001insert_270_1M_subset_results.tsv
So possible solutions (with my assessment of ease of implementation):

Make setup_libraries.sh use python2 when installing CMash (probably via a virtualenv) (easy) See why marisa-trie isn't working with python3 (their repo says it's python3 compatible) (medium) Refactor CMash so it's python3 compliant (hard)

dkoslicki commented 4 years ago

Appeared to be a problem with chunksize not being an integer, as well as file names being strings in python2 while byte literals in python3.

dkoslicki commented 4 years ago

Appears to be fixed now