bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
89 stars 18 forks source link

FileNotFoundError when writing plot fits #158

Closed flass closed 3 years ago

flass commented 3 years ago

Hi John, I've got a minor bug to report here:

Versions I am using PopPUNK v2.3.0 with pp-sketchlib 1.6.2, as provided by a conda environment built with

conda create -n poppunk230 -c defaults -c conda-forge -c bioconda poppunk==2.3.0 pp-sketchlib==1.6.2

Command used and output returned I ruan the command:

poppunk --create-db --r-files /lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/phylocoregenome_clade_genome_fasta_list.tab \
--output /lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc --threads 8 \
--min-k 15 --max-k 35 --plot-fit 5  --qc-filter prune --length-range 3500000 4500000 --max-a-dist 1

I get this output (combined stdout and stderr streams):

Sketching 7628 genomes using 8 thread(s)
Writing sketches to file
Calculating random match chances using Monte Carlo
Calculating distances using 8 thread(s)
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
        (with backend: sketchlib v1.6.2
         sketchlib: /lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 8 threads
Mode: Building new database from input sequences
0092_L008_R1.contigs_velvet failed QC
...
19787_7#60.contigs_spades failed QC
Traceback (most recent call last):
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/bin/poppunk", line 10, in <module>
    sys.exit(main())
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/__main__.py", line 278, in main
    refList, queryList, distMat = queryDatabase(rNames = rNames,
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/sketchlib.py", line 521, in queryDatabase
    plot_fit(klist,
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/plot.py", line 127, in plot_fit
    plt.savefig(out_prefix + ".pdf", bbox_inches='tight')
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/pyplot.py", line 859, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/figure.py", line 2311, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2210, in print_figure
    result = print_method(
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 1639, in wrapper
    return func(*args, **kwargs)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/backends/backend_pdf.py", line 2586, in print_pdf
    file = PdfFile(filename, metadata=metadata)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/backends/backend_pdf.py", line 562, in __init__
    fh, opened = cbook.to_filehandle(filename, "wb", return_opened=True)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/matplotlib/cbook/__init__.py", line 408, in to_filehandle
    fh = open(fname, flag, encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: '/lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc//lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc_fit_example_1.pdf'

in the specified output folder /lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc I can find the database, but not the plots:

ls -ltr 6kVcPGc/
total 790700
-rw-r--r-- 1 me mygroup      6184 Mar  5 18:50 6kVcPGc_qcreport.txt
-rw-r--r-- 1 me mygroup 809662250 Mar  5 18:53 6kVcPGc.h5

Describe the bug it seems that the plot function makes the path of ouput folder by pasting its specified dirname and full path name. It used to work OK if I executed the command in the folder above and gave a single-level relative path as value for --output. The rest of the --create-db command runs fine though.

Cheers,

Florent

flass commented 3 years ago

I'll add that subsequently running the --fit-model command:

--fit-model bgmm --ref-db /lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc \
--output /lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc --threads 8 \
--qc-filter prune --length-range 3500000 4500000 --max-a-dist 1 --K 3

it leads to the same issue as it returns:

Graph-tools OpenMP parallelisation enabled: with 8 threads
PopPUNK: visualise
Traceback (most recent call last):
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/bin/poppunk_visualise", line 10, in <module>
    sys.exit(main())
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/visualise.py", line 340, in main
    generate_visualisations(args.query_db,
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/visualise.py", line 193, in generate_visualisations
    rlist, qlist, self, complete_distMat = readPickle(distances)
  File "/lustre/scratch118/infgen/team216/fl4/miniconda3/envs/poppunk230/lib/python3.8/site-packages/PopPUNK/utils.py", line 129, in readPickle
    with open(pklName + ".pkl", 'rb') as pickle_file:
FileNotFoundError: [Errno 2] No such file or directory: '6kVcPGc//lustre/scratch118/infgen/team216/fl4/poppunk_7kVc/6kVcPGc.dists.pkl'
nickjcroucher commented 3 years ago

Thanks for pointing this out @flass - I think this should be fixed on the mst_dev branch we are currently working on (see c1f4c5a23fecc5d099529c5948642dfa8569ed4c), but this hasn't been merged in yet because I'm being slow, apologies!

flass commented 3 years ago

HI Nick,

thanks for the tip.

I have edited the code in the __main__.py and assign.py modules from my conda environment by hand to replicate the fix in c1f4c5a; as I was not sure I removed the corresponding *pyc files in __pycache__. I also had to do a similar fix in visualise.py for when calling poppunk_visualise:

187:            distances = os.path.basename(ref_db) + "/" + ref_db + ".dists"
189:            distances = os.path.basename(query_db) + "/" + query_db + ".dists"

it seems to have done the trick.

Cheers, Florent

nickjcroucher commented 3 years ago

Thanks @flass - this other change has also been made on the mst_dev branch, I forgot to mention it, apologies.