bacpop / PopPUNK

PopPUNK πŸ‘¨β€πŸŽ€ (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Issues with refine - RuntimeError: Optimisation failed: produced a boundary outside of allowed range #182

Closed bananabenana closed 2 years ago

bananabenana commented 2 years ago

Versions

Command used and output returned I have successfully run popPUNK from database building to visualisation. Great tool - big fan.

However, I am having an issue with the model refinement step. I am trying to refine my dbscan model fit. It already had a fairly good score profile from my understanding, and this makes sense biologically:

When I run: poppunk \ --fit-model refine \ --ref-db database \ --model-dir database/dbscan_model_fit \ --output database/refine_dbscan_model_fit \ --graph-weights \ --threads 48

on my dataset of ~7k Klebsiella genomes:

Describe the bug I was expecting to see a further refined model and improvement of the scores. Instead, I get an error message: Optimisation failed: produced a boundary outside of allowed range Log:

Loading DBSCAN model Completed model loading Loaded previous model of type: dbscan Selected type isolate for distance QC is 1 Initial model-based network construction based on DBSCAN fit Initial boundary based network construction Decision boundary starts at (0.33,0.87) Trying to optimise score globally

                               | 0/1

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 | 2/? | 3/? | 4/? | 5/? | 6/? | 7/? | 8/? | 9/? | 10/? | 11/? | 12/? | 13/? | 14/? | 15/? | 16/? | 17/? | 18/? | 19/? | 20/? | 21/? | 22/? | 23/? | 24/? | 25/? | 26/? | 27/? | 28/? | 29/? | 30/? | 31/? | 32/? | 33/? | 34/? | 35/? | 36/? | 37/? | 38/? | 39/? | 40/? | 40/? Traceback (most recent call last): File "/projects/js66/individuals/benV/Software/miniconda/conda/envs/popPUNK_v2.4.0_py3.9/bin/poppunk", line 11, in sys.exit(main()) File "/projects/js66/individuals/benV/Software/miniconda/conda/envs/popPUNK_v2.4.0_py3.9/lib/python3.9/site-packages/PopPUNK/main.py", line 411, in main assignments = new_model.fit(distMat, refList, model, File "/projects/js66/individuals/benV/Software/miniconda/conda/envs/popPUNK_v2.4.0_py3.9/lib/python3.9/site-packages/PopPUNK/models.py", line 781, in fit refineFit(X/self.scale, File "/projects/js66/individuals/benV/Software/miniconda/conda/envs/popPUNK_v2.4.0_py3.9/lib/python3.9/site-packages/PopPUNK/refine.py", line 203, in refineFit raise RuntimeError("Optimisation failed: produced a boundary outside of allowed range\n") RuntimeError: Optimisation failed: produced a boundary outside of allowed range

Is there a parameter I can change to avoid this occurring?

Thanks!

wanyuac commented 2 years ago

I encountered the same problem when I was fining a BGMM using popPUNK version 2.4.0. I circumvented this problem by adding the option --unconstrained.

johnlees commented 2 years ago

This can happen when the optimisation range isn't working well.

As @wanyuac notes, you can try with the unconstrained mode (though this is a little slower). Alternatively, you can add --pos-shift and --neg-shift to change the range. If you look at your output from DBSCAN, the range automatically chosen will be between the centroid of the component nearest the origin and the centroid of the second nearest. So you may want a small value to these to get the fit to work, look at the output from a successful refine run, and increase the range a bit further. Finally, you can always manually specify the entire search range with --manual-start. See https://poppunk.readthedocs.io/en/latest/model_fitting.html#using-fit-refinement-when-mixture-model-totally-fails

bananabenana commented 2 years ago

Hi, @wanyuac and @johnlees, both of these flags worked on separate runs and improved my fits. Thanks for your responses.