BioContainers / containers

Bioinformatics containers
http://biocontainers.pro
Apache License 2.0
674 stars 246 forks source link

Old AnnData container causing downstream version incompatabilities #557

Closed swbioinf closed 7 months ago

swbioinf commented 7 months ago

There is a biocontainer for an outdated AnnData

There are a number of old containers for AnnData (a python package for a data format) listed in the biocontainers repository: https://quay.io/repository/biocontainers/anndata?tab=tags There are also in the singularity builds. https://depot.galaxyproject.org/singularity/

However, these are 5 years old, at AnnData version 0.6.22.

Unfortunately, AnnData objects saved with > 0.8.0 (Mar 2022) are not backwards compatible with objects saved with current AnnData versions (e.g. current 0.10.5) https://anndata.readthedocs.io/en/latest/release-notes/index.html#version-0-8

Is there supposed to be an anndata biocontainer?

Are these containers supposed to be here? I'm unable to find anything about AnnData in the biocontainers repository at all?

There fact the versions stop at 0.6 makes me think that maybe it was removed?

Implications

I have been debugging an issue about incompatiable anndata objects between two 'galaxy' tools that both use AnnData package. One of them has a minimum version number, the other does not.

requirement chain: 'ebi-gxa anndata_opts' tool (the galaxy tool) -> requires scanpy-scripts > requires scanpy > requires AnnData (unversioned)

It is my understanding that galaxy tool building will preferentially grab the biocontainers-generated singularity containers for building its tools. It appears (as best guess) that it might be finding the ancient version and goes for that.

That means that when it sees an anndata object generated with an up-to-date anndata version, it fails to read it.

The problem is that in isolation and at every step of the chain - the old version works just fine. Understandably then, devs are reluctant to impose versioning (see tickets). And the scanpy toolkit is much more broadly used than the galaxy tool level where the issue emerges, so I don't like my chances there.

The issue is really only one of interoperability of different tools on different containers passing around data in their shared AnnData format - which is expected in a heavily containerized platform like galaxy!

Relevant tickets (we've been chasing this one up and down the requirements!)

Solutions?

1) Does AnnData need to be in biocontainers at all? If it really is some sort of vestigal build without any associated code, could it be removed entirely? I assume then tools will be free to find an up-to-date version from wherever else they look, as per the scanpy package.

2) If Anndata is here for a reason, and I've missed where AnnData is getting defined and built (I'm not a singularity person) , please someone point me at it and I'll try to understand it, and maybe add a pull request to get it in to the current decade :)

Thanks!

mboudet commented 7 months ago

Alright, I looked into it.

From what I can see, until 0.6.22 , AnnData had a 'bioconda' conda package (https://anaconda.org/bioconda/anndata) All bioconda packages have an automatic docker and singularity image built (on quay.io and https://depot.galaxyproject.org ) .

However, it seems that afterward it moved onto the conda-forge conda channel (https://anaconda.org/conda-forge/anndata) (Not sure about the reasons for the move). So there has been not automatic docker build for the new releases, as I don't believe conda-forge does it.

(Bioconda packages are defined here https://github.com/bioconda/bioconda-recipes/tree/master)

swbioinf commented 7 months ago

Ah Thanks @mboudet . That would explain it. I'll keep looking for a workaround then!