dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 10 forks source link

Repeatedly paginating over 000876 assets returns different results #1915

Closed jwodder closed 3 months ago

jwodder commented 3 months ago

As of 2024-04-01 09:13 -05:00, running the following code:

import requests

DANDISET = "000876"

with requests.Session() as s:
    url = (
        f"https://api.dandiarchive.org/api/dandisets/{DANDISET}/versions/draft/assets/"
    )
    while url is not None:
        data = s.get(url).json()
        for asset in data["results"]:
            print(asset["path"])
        url = data["next"]

produced https://gist.github.com/jwodder/e68a10a9a93ed4e44ace26df671cfc6e as output. Running the script again produced https://gist.github.com/jwodder/bc8cdb2bac2288305472f7ba339c05cf as output, which is markedly different, such as there being 1861 paths that were listed in the first output but not the second and 1567 paths listed in the second but not the first.

yarikoptic commented 3 months ago

While resolving this issue I would like to see a unit-test added which populates a good number of assets (100?) to a dandiset, and then ensures correct listing through pagination (page size could be reduced e.g. to 1).

jjnesbitt commented 3 months ago

Fixed via #1910