cov-lineages / pangolin-data

Repository for storing latest model, protobuf, designation hash and alias files for pangolin assignments
GNU General Public License v3.0
27 stars 2 forks source link

pangolin-data from conda: version 1.13 installed according to conda but packages say 1.12 #22

Closed KatSteinke closed 1 year ago

KatSteinke commented 1 year ago

When I tried to test the new release of pangolin-data (1.13) in a fresh conda environment (mainly containing pangolin and nextclade), pangolin seemed to use pangolin-data 1.12 instead - the output of pangolin --all-versions is

pangolin: 4.1.2
pangolin-data: 1.12
constellations: v0.1.10
scorpio: 0.3.17
usher 0.5.6
gofasta 1.1.0
minimap2 2.24-r1122
faToVcf: 426

and the version column in the output was "PUSHER-v1.12". However, conda list gives

...
pangolin                  4.1.2              pyhdfd78af_0    bioconda
pangolin-data             1.13               pyh5e36f6f_0    bioconda
...

which would suggest to me that the correct version was installed. I dug around in the pangolin-data .json file in conda-meta in the environment in question to see if there was something off with the version in there. That part seemed fine:

 "version": "1.13"

However, looking further up at the files, I stumbled across several references to 1.12 instead:

"files": [
    "bin/pangolin_data",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/INSTALLER",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/LICENSE",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/METADATA",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/RECORD",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/REQUESTED",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/WHEEL",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/direct_url.json",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/entry_points.txt",
    "lib/python3.8/site-packages/pangolin_data-1.12.dist-info/top_level.txt",
    "lib/python3.8/site-packages/pangolin_data/__init__.py",
    "lib/python3.8/site-packages/pangolin_data/data/alias_key.json",
    "lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb",
    "lib/python3.8/site-packages/pangolin_data/data/lineages.hash.csv",
    "lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib",
    "lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib",
    "lib/python3.8/site-packages/pangolin_data/__pycache__/__init__.cpython-38.pyc"
  ]

And indeed, when I look for the version in conda/lib/python3.8/site-packages/pangolin_data/__init__.py, what I get is

_program = "pangolin_data"
__version__ = "1.12"

To try and exclude the additional dependencies, I've also tried this with mamba, in a completely clean environment. The result is the same:

$ mamba create -n test_pango_update -c conda-forge -c bioconda -c defaults pangolin=4.1.2 pangolin-data=1.13
$ conda activate test_pango_update
$ pangolin --all-versions
pangolin: 4.1.2
pangolin-data: 1.12
constellations: v0.1.10
scorpio: 0.3.17
usher 0.5.6
gofasta 1.1.0
minimap2 2.24-r1122
faToVcf: 426

So... it seems like the wrong version of pangolin-data is being installed with conda under the right label.

matthuska commented 1 year ago

It looks like pre-release versions of the pangolin-data package are being published as full releases via bioconda. I'm not sure if this is best handled on the bioconda side (a general solution to ignore pre-releases?) or by the pangolin team (stop doing pre-releases?).

aineniamh commented 1 year ago

We have an agreement with gisaid that we do pre-releases as a heads up before we do a data release, so unless something changes we have to keep doing pre-releases. Perhaps the biocidal side could ignore them?

matthuska commented 1 year ago

Having the bioconda side ignore them would be ideal. I'm not too familiar with how to make that work so you'd probably have to talk to them either on gitter or by creating an issue. There is some discussion here: https://github.com/bioconda/bioconda-recipes/issues/18659 but it's not quite what we're talking about.

It also might be sufficient if you give the pre-release an appropriate version number (e.g. 1.13a1 for the first pre-release of 1.13 instead of just 1.13). I think this is supported by conda: https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/pkg-specs.html#supported-version-strings

mg14 commented 1 year ago

Hi all - I took the liberty to issue a bioconda PR to trigger a rebuild of pangolin-data 1.13.

https://github.com/bioconda/bioconda-recipes/pull/36535

This appears to have fixed the current issue:

$ conda create -n test_pango_update -c conda-forge -c bioconda -c defaults pangolin=4.1.2 pangolin-data=1.13
$ conda activate test_pango_update
(test_pango_update)$ pangolin --all-versions
pangolin: 4.1.2
pangolin-data: 1.13
constellations: v0.1.10
scorpio: 0.3.17
usher 0.5.6
gofasta 1.1.0
minimap2 2.24-r1122
faToVcf: 426

Keep up the good stuff!

corneliusroemer commented 1 year ago

Feel free to escalate this to me next time - I know some of the bioconda folks a little bit so I can triage this once aware of the issue.

I'll close this issue as fixed now - thanks @KatSteinke for reporting!

corneliusroemer commented 1 year ago

@aineniamh why does the prerelease not have v1.13 if it says it's a prerelease of v1.13? That's a bit confusing.

What's the purpose of it if it doesn't actually contain the new version? If this is just meant to be an announcement that v1.13 will happen, there's maybe a way for it to not contain any code at all so that bioconda doesn't pull it?

mg14 commented 1 year ago

Perhaps Matt's suggestion of using a prerelease suffix could avoid this in the future? https://github.com/cov-lineages/pangolin-data/issues/22#issuecomment-1214759528

Even if the release candidates were automatically pulled into bioconda with an rc version tag (1.13rc), the proper release tag (1.13) would then trigger a new bioconda build and avoid it getting stuck with the prerelease as seems to have happened here.

AngieHinrichs commented 1 year ago

What's the purpose of it if it doesn't actually contain the new version?

Most of the time it does -- but this time, by mistake, the prerelease was created before the v1.13 changes were merged into pangolin-data's master branch (the changes were still on a prerelease_v1.13 branch where Áine and I merged our respective changes to the pangoLEARN model and usher lineageTree.pb). The prerelease_v1.13 branch was merged into master before the pre-release checkbox was unchecked and the pre-release became the official latest release, but apparently that was too late for bioconda's speedy ingest of the prerelease.

Does bioconda use the github API to learn about new releases? There is a flag in the JSON that marks pre-release status, that could be used to prevent update. Or does bioconda have people watching their inboxes for (pre)release emails, or some other mechanism?