MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

API to determine latest version of a published project #1381

Open bemoody opened 3 years ago

bemoody commented 3 years ago

The wfdb-python package wants to be able to read data from a published project, specifying only the slug and requesting the "latest version".

This is... kinda messy, since the latest version might not be what you expect. But it is a convenience for programmers.

The way this is currently implemented in wfdb-python, though, is very fragile:

# url is the project URL such as 'https://physionet.org/content/mitdb/'
response = requests.get(url)
contents = [line.decode('utf-8').strip() for line in response.content.splitlines()]
version_number = [v for v in contents if 'Version:' in v]
version_number = version_number[0].split(':')[-1].strip().split('<')[0]

Various alternatives come to mind even without making any changes to the server:

But whatever we do needs to be robust, efficient, and supported by the server in the long term (hence why I'm raising this issue here and not in wfdb-python).

Ideally the programmer should be required to supply a major version number along with the project slug so that incompatible future versions won't be used automatically. (The version number might default to 1; note that all currently published waveform DBs are at version 1 except for nch-sleep/0.1.0.)

So one idea would be to add a URL such as /rest/content/mitdb/1/, that would return data similar to /rest/database-list/, but only for the latest version of mitdb 1.x.

Another idea I've tossed around at times would be to add a URL such as /content/mitdb/1.*/ that would redirect to /content/mitdb/1.0.0/. Parsing the location header would be simpler for the client than parsing a JSON string, but reduces long-term flexibility for the server.

alistairewj commented 3 years ago

Looks like this relates to #821 - briefly a user thought there were some browser usability advantages to having a "latest" or "current" as a slug redirecting to the latest version. You raised some good points about exactly resolving versions in that issue.

(Side note: I actually think #821 is already dealt with to some extent...? In that if I go to https://physionet.org/content/mimiciv/ I am automatically redirected to the latest version.)

Feels like wfdb-python could maintain a local copy of the data in /rest/database-list, and then only refresh it if it doesn't find the exact version in it's local copy? Might save pinging the server every time to confirm that v1.0.0 exists. Wouldn't save you anything if they ask for the latest copy of v1.* though. I'm not sure if that list contains all historical versions though, is there documentation of the database-list API somewhere?