cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

Different lineage assignment probabilities in online implementation of Pangolin vs local? #102

Closed charlesfoster closed 3 years ago

charlesfoster commented 3 years ago

Hi,

Firstly, thanks for the great tool.

I've been running a local installation of pangolin via the command line. Today I thought I'd compare the result to that given on https://pangolin.cog-uk.io/. The lineage assignment was the same (B1.1), but the assignment probability was very different. Run through my local installation, the assignment probability was 0.46, yet through https://pangolin.cog-uk.io/ the probability was 1.

I followed the instructions to update pangolin (https://github.com/cov-lineages/pangolin#updating-pangolin) today, so my installation of pangolin should be up to date. Details:

I'm not sure if it's relevant, but I get the following warning despite using the installation instructions: /Users/cfos/miniconda3/envs/pangolin/lib/python3.6/site-packages/scikit_learn-0.23.2-py3.6-macosx-10.7-x86_64.egg/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.

What might be causing the discrepancy in assignment probability? Is the web-app version more up to date? I get the same results with or without the --panGUIlin flag.

Thanks!

zhemingfan commented 3 years ago

Hi Charles,

I'm getting the same error as you. I noticed a discrepancy in the _NC045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome , which gave a probability of 0.56 locally versus 1 on the web application.

Regarding the error message, I dug around and found out that the package was calling from https://pypi.org/simple/sklearn/ . From my understanding, this calls the 0.0 version. Changing this line to scikit-learn>=0.23.1 resolves the warning message.

I tried another genome from GSAID that I took (EPI_ISL_504185.fasta), and got 0.46 locally, but 1 on the web application with lineage classified as B.

antunderwood commented 3 years ago

@charlesfoster and @zhemingfan the web application (https://pangolin.cog-uk.io/ ) was lagging a bit behind in the version of pangoLEARN deployed. It has now been updated to version 2020-10-30. Please can you rerun the sequences you mention in your comments above and report if you still the same discrepancy.

Thanks

charlesfoster commented 3 years ago

@aunderwo thanks for the update. I ran the sequence through the web application and this time got the same assignment probability as through the command-line version. All fixed!

aineniamh commented 3 years ago

Ah I'm glad! Thanks @aunderwo for doing the update!