conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
149 stars 50 forks source link

API modification: add ability to fetch uniquely named packages; return list of versions available for each #221

Closed peytondmurray closed 1 year ago

peytondmurray commented 2 years ago

Current Behavior

Right now, the /api/v1/package/ endpoint fetches the packages which are available for the user to install. Currently, I can control the list of packages returned to me by using the appropriate query parameters. For example, if I send a GET to /api/v1/package/?page=1&size=10&distinct_on=name&distinct_on=version&sort_by=name, the response is a list of results which have distinct names and versions:

{
    "count": 109441,
    "data": [
        {
            "build": "py38h9a4a7a8_1",
            "channel_id": 2,
            "id": 101824,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "383a4dde58ca57811108d44cde454e04d6ac861e77e5f200e8dad803f863d914",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.0.2"
        },
        {
            "build": "py39h20ed36d_1",
            "channel_id": 2,
            "id": 101833,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "545806eb2664ead4becc0ab147d54c7d8dbc523cd0b3ce7d7481c99430506c2e",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.0.3"
        },
        {
            "build": "py36h29bcdb0_0",
            "channel_id": 2,
            "id": 101835,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "33b624be3076a788e12853df26c19058a37b832dc69839a73378488c0a208788",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.1"
        },
        {
            "build": "py36h29bcdb0_0",
            "channel_id": 2,
            "id": 101839,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "d8c9436f006009a7bd64b3fde5253a4818f33510a32a036159198dcce7880ddb",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.2"
        },
        {
            "build": "py38ha5b31ff_0",
            "channel_id": 2,
            "id": 324514,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "4412925b69a7aa627374446eaa4e942c5626732e018e92a2574ea6a9e7d14234",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.3"
        },
        {
            "build": "hbb7d975_1",
            "channel_id": 2,
            "id": 101847,
            "license": "Public Domain",
            "name": "2dfatmic",
            "sha256": "3b80e7812c33f825a20ccec3b8d16655de1b396a3b8dc71844a80c3b2823a9b7",
            "summary": "Two-Dimensional Subsurface Flow, Fate and Transport of Microbes and Chemicals Model",
            "version": "1.0"
        },
        {
            "build": "h618b193_0",
            "channel_id": 2,
            "id": 101850,
            "license": "GPLv2+",
            "name": "4ti2",
            "sha256": "d9f122bbb25d291391f1b4438e556ccee350e2487bde1fd3942d3577dcee8f42",
            "summary": "A software package for algebraic, geometric and combinatorial problems on linear spaces",
            "version": "1.6.9"
        },
        {
            "build": "pyh9f0ad1d_0",
            "channel_id": 2,
            "id": 27962,
            "license": "GPL-3.0-or-later",
            "name": "aadict",
            "sha256": "43e3e090dde8469e2514e1526ea446b9338a2204ffe94e51b51ac404d376447e",
            "summary": "An auto-attribute dict (and a couple of other useful dict functions)",
            "version": "0.2.3"
        },
        {
            "build": "pyhd8ed1ab_0",
            "channel_id": 2,
            "id": 27963,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "sha256": "86640eb12bba8927475c1356bf6a19211331437c21589da314253c8ffc662098",
            "summary": "Bayesian optimization structure search",
            "version": "1.1"
        },
        {
            "build": "pyhd8ed1ab_0",
            "channel_id": 2,
            "id": 27964,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "sha256": "cf886fdd2605e679c68a9209e565ca8457d3d17d71a93ce8956cda4d5827d5a9",
            "summary": "Bayesian optimization structure search",
            "version": "1.2"
        }
    ],
    "page": 1,
    "size": 10,
    "status": "ok"
}

For conda-store integration in Gator this is not optimal, because packages can have many versions; for each single fetch request, only a few distinct packages are returned, with most of the results just different versions of the same package.

Proposed behavior

Instead of treating distinctly-versioned packages as separate treat them as a single package, and return a list of installable versions for each uniquely named package:

{
    "count": 109441,
    "data": [
        {
            "channel_id": 2,
            "license": "MIT",
            "name": "21cmfast",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "versions": ["3.0.2", "3.0.3", "3.1.1", "3.1.2", "3.1.3"]
        },
        {
            "channel_id": 2,
            "license": "Public Domain",
            "name": "2dfatmic",
            "summary": "Two-Dimensional Subsurface Flow, Fate and Transport of Microbes and Chemicals Model",
            "versions": ["1.0"]
        },
        {
            "channel_id": 2,
            "license": "GPLv2+",
            "name": "4ti2",
            "summary": "A software package for algebraic, geometric and combinatorial problems on linear spaces",
            "versions": ["1.6.9"]
        },
        {
            "channel_id": 2,
            "license": "GPL-3.0-or-later",
            "name": "aadict",
            "summary": "An auto-attribute dict (and a couple of other useful dict functions)",
            "versions": ["0.2.3"]
        },
        {
            "channel_id": 2,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "summary": "Bayesian optimization structure search",
            "versions": ["1.1", "1.2"]
        },
    ],
    "page": 1,
    "size": 10,
    "status": "ok"
}

I haven't yet worked out the implications for cases where different versions of a package are provided on different channels, or handling different builds, but I just wanted to get the discussion going. I think if the user wants more information about a specific package (individual version information, for example), they can use the search query parameter on the /api/v1/package/ endpoint. This change would greatly improve the user experience for browsing packages with the Gator JupyterLab extension.

costrouc commented 2 years ago

Thanks for this issue @peytondmurray. Totally agree that this is an important thing for the conda-store api to return. Thinking of how we would like the api to return results. I like your proposed behavior. Thinking of how this can be done efficiently within a database.

peytondmurray commented 2 years ago

I don't want to forget about this, and I'm interested in trying to implement this. I'll give it a shot this weekend and report back here.

costrouc commented 2 years ago

I'm going to give some pushback on this feature. I've been working on implementing it. Suppose the endpoint returns:

{
   license: "...",
   name: "...",
   channel_id: "...",
   summary: "...",
   versions: [
         ["package_id", "version_str"],
         "..."
   ]
}

A few of my concerns:

I agree that right now the api is not fast enough making this take awhile. I need to speed up these queries.

costrouc commented 2 years ago

Got more color in our meeting today. We will limit the scope to individual packages.

E.g. /api/v1/package/<channel-identifier>/<package-name>

kcpevey commented 1 year ago

@costrouc is this still valid?

pierrotsmnrd commented 1 year ago

The API changed since the issue was open, and the performance were greatly improved since then.

Do we still want this feature of listing versions of a given package ?

trallard commented 1 year ago

perhaps not - we can close and if needed we can reopen or make a better scoped issue

costrouc commented 1 year ago

Yeah I agree on closing this issue. When @pierrotsmnrd significantly improved the performance of the package api I don't think there is as much of a use case. Since you can just normally query the api.