Closed egenomics closed 2 years ago
This looks like a duplicate of: https://github.com/cov-lineages/pangolin/issues/427
I've encountered this error before, too. Try to reinstall your environment.
This warning gives a hint about a possible reason: you may not be using the sklearn version expected. Setting up a fresh environment with pangolin should fix this.
Let me know if reinstalling doesn't solve the problem and then share information on the exact packages and their versions installed in your environment.
@aineniamh @corneliusroemer I think this issue deserves reopening.
This error comes from the fact that apparently the most recent pangoLEARN models have been built using a more recent version of scikit-learn. The bioconda recipe for pangolin 3.1.20 has its scikit-learn dependency pinned to 0.23.1 (https://github.com/bioconda/bioconda-recipes/blob/a574d43146db09006d462746aa1d8716c77404b4/recipes/pangolin/meta.yaml#L25) and due to internal changes in scikit-learn models dumped with versions > 1.0 will not load with that older version (compare https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations). Conversely when you're trying to load a model that got dumped with a pre-1.0 version of scikit-learn with a version > 1.0 you will see a warning like this one:
UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
and though I have no idea whether the model would really be compromised that doesn't sound encouraging.
Since dumped scikit-learn models are generally not guaranteed to be reloadable with different versions, I think the bioconda approach of pinning a given pangolin release to a specific version of scikit-learn is the right thing to do, but it requires that:
For 3.1.20 I'm not sure what should be done now. Fact is that models since 2022-04-09 won't work with fresh conda installs of pangolin 3.1.20, but there's no simple fix I can see. The question is whether you'd want to switch back to building future models with scikit-learn 0.24 agaiiin as you did previously?
More importantly, however, the same logic holds for pangolin v4 and its pangoLEARN part of pangolin-data, too. Again, it would be good to have the scikit-learn version clearly stated, and most importantly not changing unnecessarily.
@egenomics a solution to fix your issue (without updating to pangolin 4) is to:
0.23.1
to >=0.23.1
, thenThis will enable you to run recent models of pangoLEARN with your pangolin. However, you'll see the UserWarning above when trying to run with older models.
I'd like to just give a warning that when we released pangolin 4.0, I intended to maintain pangoLEARN for a couple of months before phasing it out. This was just to give a buffer zone of time for people to update to pangolin 4.0. It's been about 5 weeks, so bear in mind that this repository won't be maintained much longer!
I think this is a good point about scikit-learn versions though, as this is relevant to the random forest model too (you don't see the warnings in 4.0 but the same thing exists that people's local version of scikit-learn may be different to what we've trained on). We can specify a particular version of scikit-learn if this might be an issue, but I've never noticed the version of scikit-learn effecting the inference from the model.
Thanks @wm75 for investigating and giving such a detailed description of what's behind the error here and in https://github.com/cov-lineages/pangolin/issues/427
The happy path is to use up to date pangolin
models with up to date pangoLEARN
models.
If for reproducibility one needs to use an old pangoLEARN
model, one should use the corresponding pangolin
version that was around at the time the model was trained.
@aineniamh Do I understand you correctly that pangoLEARN
as a whole will be phased out?
Yeah as it's no longer needed in pangolin 4.0, I'll archive the repo at some point in the not too distant future.
Hi, We have been using pangolin (through conda) for a while now. With the last pangolearn update our pipeline broke. We are using pangolin: 3.1.20 pangolearn: 2022-04-09 constellations: v0.1.7 scorpio: 0.3.16 pango-designation used by pangoLEARN/Usher: v1.3 pango-designation aliases: 1.6
We get the following error:
All dependencies satisfied. The query file is:/datos/MiSeq/MICRO/COVID/analysis/2022_04_19_R2247/consensus/consensus.R2247.fna Running sequence QC Number of sequences detected: 48 Total passing QC: 44
Data files found: Trained model: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/decisionTree_v1.joblib Header file: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/decisionTreeHeaders_v1.joblib Designated hash: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/lineages.hash.csv
Job stats: job count min threads max threads
add_failed_seqs 1 1 1 align_to_reference 1 1 1 all 1 1 1 generate_report 1 1 1 get_constellations 1 1 1 hash_sequence_assign 1 1 1 pangolearn 1 1 1 scorpio 1 1 1 total 8 1 1
loading model 04/19/2022, 14:24:50 /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.1 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk. warnings.warn( processing block of 44 sequences 04/19/2022, 14:24:51 [Tue Apr 19 14:24:52 2022] Error in rule pangolearn: jobid: 0 output: /tmp/tmpz2w2ggj4/lineage_report.pass_qc.csv
RuleException: AttributeError in line 112 of /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/scripts/pangolearn.smk: 'DecisionTreeClassifier' object has no attribute 'nfeatures' File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/scripts/pangolearn.smk", line 112, in __rule_pangolearn File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/pangolearn/pangolearn.py", line 170, in assign_lineage File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 922, in predict_proba File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 395, in _validate_X_predict File "/root/miniconda3/envs/pangolin_test/lib/python3.8/concurrent/futures/thread.py", line 57, in run Exiting because a job execution failed. Look above for error message Exiting because a job execution failed. Look above for error message