bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Existing database cannot be in nested directories #170

Closed mgalardini closed 3 years ago

mgalardini commented 3 years ago

Versions

poppunk 2.3.0
poppunk_sketch 1.7.3

Command used and output returned

$ poppunk --create-db --output database --r-files out/poppunk_input.txt --threads 24 --output out/poppunk --qc-filter continue
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)                                                                                                                                                           
        (with backend: sketchlib v1.7.3                                                                                                                                                                            
         sketchlib: /fast-storage/miniconda3/envs/poppunk/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)                                                                                 

Graph-tools OpenMP parallelisation enabled: with 24 threads                                                                                                                                                        
Mode: Building new database from input sequences                                                                                                                                                                   
Sketching 392 genomes using 24 thread(s)                                                                                                                                                                           
Progress (CPU): 392 / 392
Writing sketches to file
Calculating random match chances using Monte Carlo
Calculating distances using 24 thread(s)
Progress (CPU): 100.0%

Done
$ poppunk --fit-model dbscan --ref-db out/poppunk --threads 24
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
        (with backend: sketchlib v1.7.3
         sketchlib: /fast-storage/miniconda3/envs/poppunk/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 24 threads
Mode: Fitting dbscan model to reference database

Traceback (most recent call last):
  File "/fast-storage/miniconda3/envs/poppunk/bin/poppunk", line 10, in <module>
    sys.exit(main())
  File "/fast-storage/miniconda3/envs/poppunk/lib/python3.8/site-packages/PopPUNK/__main__.py", line 338, in main
    refList, queryList, self, distMat = readPickle(distances, enforce_self=True)
  File "/fast-storage/miniconda3/envs/poppunk/lib/python3.8/site-packages/PopPUNK/utils.py", line 129, in readPickle
    with open(pklName + ".pkl", 'rb') as pickle_file:
FileNotFoundError: [Errno 2] No such file or directory: 'poppunk/out/poppunk.dists.pkl'

Describe the bug

When creating the sketches, the output database can be indicated as being nested inside an existing directory, but not when fitting the model. I believe the problem is here when the code is looking at the base directory. I am fine just placing the db in the same directory in which I am running poppunk but maybe you have an interest in fixing this.

johnlees commented 3 years ago

Hi Marco,

Thanks for the detailed report. Really this should work with the options you've used, so we will try and fix. What is in out/poppunk after running the first command?

mgalardini commented 3 years ago
$ ls out/poppunk/
poppunk_distanceDistribution.png  poppunk.dists.npy  poppunk.dists.pkl  poppunk.h5  poppunk_qcreport.txt
nickjcroucher commented 3 years ago

Hi Marco - I'm looking at a fix on our most recent branch (refine_fix), but I can't replicate in the test directory with:

../poppunk-runner.py --create-db --r-files references.txt  --output out/poppunk --qc-filter continue
../poppunk-runner.py --fit-model dbscan --ref-db out/poppunk

Does this work for you? Also, there are two --output flags in your first command, in case that's causing any problems.

mgalardini commented 3 years ago

I updated poppunk to version 2.4.0 to fix #171 and it seems that the issue is not there anymore. However if I use version 2.3.0 the issue reappears, even after removing the extra --output argument in the first command.

nickjcroucher commented 3 years ago

I changed some of the file parsing behaviour in the upgrade to v2.4, hopefully that's fixed the behaviour, please let us know if this issue reappears.

mgalardini commented 3 years ago

Fantastic, will certainly do. Thanks for the great software