dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
14 stars 10 forks source link

Must not silently ignore incorrect version of a dandiset in URL #1020

Closed yarikoptic closed 1 year ago

yarikoptic commented 2 years ago

Was not sure if I should add to the WiP design doc PR https://github.com/dandi/dandi-archive/pull/1017 (thus attn @AlmightyYakob) but just noted that behavior, which is also present with current main instance still using # in URLs.

If version in the URL is not correct, app just ignores it so that https://gui.dandiarchive.org/#/dandiset/000167/0.220329.0721 silently leads to https://gui.dandiarchive.org/#/dandiset/000167/0.220329.0720 version of the dataset.

In comparison, the redirector is rightfully not that forgiving: going to https://dandiarchive.org/dandiset/000167/0.220329.0721 leads to dandi:000167/0.220329.0721 not found.

$> wget https://dandiarchive.org/dandiset/000167/0.220329.0721
--2022-04-08 18:13:45--  https://dandiarchive.org/dandiset/000167/0.220329.0721
Resolving dandiarchive.org (dandiarchive.org)... 3.12.21.28
Connecting to dandiarchive.org (dandiarchive.org)|3.12.21.28|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-04-08 18:13:45 ERROR 404: Not Found.

and web app should behave similarly.

https://dandiarchive.org/dandiset/000167/0.220329.0720

jjnesbitt commented 2 years ago

@yarikoptic thanks for filing this, I think you were right to keep this separate from that design doc, IMO they're two separate issues. Here's one of the lines responsible for the behavior:

https://github.com/dandi/dandi-archive/blob/b47f730878d6531eeb287564964f8e6fabe9911e/web/src/views/DandisetLandingView/DandisetLandingView.vue#L131-L132

Since the version that's included isn't valid, it just defaults to the most recent version. I agree this is potentially confusing, and some sort of not found page should be implemented instead. With the pending change of hosting the GUI at dandiarchive.org directly, I think the main tasks here would be the following:

Does that sound like reasonable behavior?

satra commented 2 years ago

the redirector provides the ability to wget/curl this invalid state. the app will not be able to do so i think. we will need to come up with a way where a browser isn't required to get such a response. to help with search optimization netlify does have an option for server side rendering, which could help with this.

jjnesbitt commented 2 years ago

the redirector provides the ability to wget/curl this invalid state. the app will not be able to do so i think. we will need to come up with a way where a browser isn't required to get such a response. to help with search optimization netlify does have an option for server side rendering, which could help with this.

Why wouldn't the API simply be used for this?

satra commented 2 years ago

Why wouldn't the API simply be used for this?

how would the curl query get to the api from the url that starts with https://<instance>/dandiset/...?

jjnesbitt commented 2 years ago

Why wouldn't the API simply be used for this?

how would the curl query get to the api from the url that starts with https://<instance>/dandiset/...?

My thought would be that this is a two step processes, if the API server of an instance truly can't be known beforehand. The first request would be to https://<instance>/server-info, to discover the api server, and then a following request to https://api.<instance>/dandiset/... (or whatever the api server for that instance is).

While server side rendering could provide us with statically generated 404 pages, I think that's a separate discussion, as there's other compelling reasons to move to SSR (SEO being likely the main one). I'm not sure what Netlify supports for SSR, I couldn't find much on it, but it wouldn't be a drop in replacement. At the very least it'd require a huge refactor of the GUI, and more likely a total shift in how we deploy the website, so it's probably best left to its own discussion.

IMO if trying to access dandi in a programmatic way, the API should be used, and we shouldn't attempt to overload functionality onto the GUI. Although I don't fully know what the motivation for wanting to access the GUI instead of the API server directly is, so I'd be happy to have a discussion to understand that more.

satra commented 2 years ago

for any general user there are only a few ways they interact with the archive:

most people would open a browser and things would work (for identifier they would need a resolver: identifiers.org). now some users know curl and they would want to get the same information through curl (still interactive), and this would neither tell them about server-info or give them appropriate redirection/exception. the redirector currently handles this use case.

the server-info endpoint is something that only programmers who wish to program against the api would know.

given the curl use space is small, i would be ok with not worrying about it at the moment. but it would be good to improve the html response. it's just that as @yarikoptic noted, the behavior would be different.

re: netlify and ssr, it's just a checkbox on their config that can do this. we have discussed in the past to turn it on and try, but never actually clicked on the checkbox.

jjnesbitt commented 2 years ago

given the curl use space is small, i would be ok with not worrying about it at the moment. but it would be good to improve the html response. it's just that as @yarikoptic noted, the behavior would be different.

This would be my preference, as we really don't have a good way to either include this behavior in the Netlify redirector, or improve the statically returned HTML (see below) at the moment. The Netlify redirection is done through definitions in a toml file, and so doesn't have the ability to use extra logic or make rest requests as the current redirector does.

re: netlify and ssr, it's just a checkbox on their config that can do this. we have discussed in the past to turn it on and try, but never actually clicked on the checkbox.

Are you referring to sitemap generation? I don't think it's possible for Netlify to automatically provide SSR from a Vue app, as the point of SSR is to have your own logic that is implemented on the server for partial rendering of html before it's returned to the browser.

yarikoptic commented 2 years ago

yet to assess what would be ramifications, but as I have mentioned during our monday standup, it is not just curl use space -- it is rather the space of anything non-human which wants or needs to talk to an archive. Conforming to established standard exit error codes is one of the main vehicles. A fresh finding is that dandi-cli test would not fail where we expected a 404 for a non-existing dandiset -- https://github.com/dandi/dandi-cli/issues/967 .

Unfortunately the solution is not that simple actually to cater to both tools and humans/browsers. Even behavior of github is not that clear and depends on (I guess) the client Agent specified since even behavior of curl and wget differ:

$> wget https://github.com/mumba/dumbra
--2022-04-20 16:23:03--  https://github.com/mumba/dumbra
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-04-20 16:23:04 ERROR 404: Not Found.

lena:/tmp
$> curl https://github.com/mumba/dumbra | head
...
<!DOCTYPE html>
<html lang="en" data-color-mode="auto" data-light-theme="light" data-dark-theme="dark" >
  <head>
    <meta charset="utf-8">
100  8211    0  8211    0     0  51992      0 --:--:-- --:--:-- --:--:-- 52299
curl: (23) Failure writing output to destination

returning plain 404 sounds like the best standard way for now. But could it be achieved?

jjnesbitt commented 2 years ago

returning plain 404 sounds like the best standard way for now. But could it be achieved?

Until we thoroughly discuss server side rendering, I don't think we can really act on this. In order for us to do this we'd need to make an ajax request to the API to confirm that the requested dataset exists, and "return" the correct page, or return a 404 if the ajax request returns a 404. Afaik there's no way for us to do this with Netlify redirects, they can't do things conditionally like that.

The discussion of SSR is probably one we should have, but I think we should punt on this until that's taken place. I also don't think this in and of itself is a reason to move to SSR, but rather a beneficial side effect of it.

waxlamp commented 2 years ago

yet to assess what would be ramifications, but as I have mentioned during our monday standup, it is not just curl use space -- it is rather the space of anything non-human which wants or needs to talk to an archive.

But this is precisely what APIs are for.

The DANDI archive was perhaps better suited for a pure SSR solution from the start, so let's imagine we magically converted it to be SSR. curl usage would instantly improve, since it would get the same 404 response for an invalid dandiset ID as a human viewer of the website would. But I think that's about all the curl user would gain--in case of a 200 response, they would still need to parse out the DLP HTML to recover data about the dandiset.

So I guess I would say, folks wanting to automate interaction with the DANDI archive would need to learn how the API works and use that directly--and then they get to have JSON-formatted data, as well as proper 404 responses, etc.

To summarize this a different way: the web portal is truly meant for humans, not machines, and IMO we should lean into that idea.

In the case of SSR, I suppose we could send a different response depending on the user agent string, but even in that case I might simply argue for machine clients to make use of the API, since that's really what it's there for.

returning plain 404 sounds like the best standard way for now. But could it be achieved?

Echoing Jake above, with our current architecture, I don't see how (but I'm open to ideas). My proposal for unasking this question is still for curl-users to hit the API instead of the frontend.

yarikoptic commented 2 years ago

So ok, issue should indeed be solved "one step at a time" and let's just get back to the original issue that there is no indication to a human user whatsoever that entered URL is not leading to some valid dandiset version. E.g. go to https://dandiarchive.org/dandiset/000167/0.220329.completenonsense

jjnesbitt commented 2 years ago

So ok, issue should indeed be solved "one step at a time" and let's just get back to the original issue that there is no indication to a human user whatsoever that entered URL is not leading to some valid dandiset version. E.g. go to dandiarchive.org/dandiset/000167/0.220329.completenonsense

Yes, this is something we should definitely address.

dandibot commented 1 year ago

:rocket: Issue was released in v0.3.17 :rocket: