Closed csjx closed 4 years ago
This is something we've wanted to do for a while, but it will break all current view/ URLs that don't have the identifier encoded. Also, all of the DOI registrations will need to be updated, yeah?
Possibly, but I wonder if maybe it wouldn't be so bad, @laurenwalker. Could you check my thinking to see if it makes sense:
De-encoding strings via decodeURIComponent
has a helpful property in that it stops producing a different string once the string is decoded,
"doi:10.123/ABC" === decodeURIComponent("doi:10.123/ABC") // true
So I think we can just blindly decode whatever comes in for some routes, like /view
. As an side, a string can be encoded multiple times and you can just keep decoding it until the result stops changing:
"doi:10.123/ABC" === decodeURIComponent(decodeURIComponent(encodeURIComponent(encodeURIComponent("doi:10.123/ABC")))) // true
So I think we can use the above to effectively detect if we have a legacy URL or not. For /view/{ID}
this is even simpler since we can just take what's after /view/
, decode it, and work with it. Of course, we then have to be careful if we wanna add a another path part to /view
like /view/{ID}/{eml-id}
.
Ah, thanks Bryce, I didn't think of it that way. Right on!
I've made an initial pass and like where this is going. In order to simplify things and avoid bugs, my approach is to separate identifier handling into two sides:
this.get("id")
, it's the literal identifier and not an encoded version.Due to properties of URL encoding/decoding we do induce any breakage in URLs to our content because existing URLs (bookmarks, DOI registrations) using non-encoded PIDs still work. This is because unencodedPID === decodeURLComponent(unencodedPID)
. We probably should update our registrations or at least change our DOI registration code going forward.
So far I've tested the search view, metadata view, user view, and editor. I found about three of our patterns we use in our codebase that I should be able to search for and refactor.
Making good but slow progress. I've found about seven or so patterns for how we reference identifiers when building URLs and I'm still finding more here and there as I go. I think this is still quite feasible and testing so far is going very well.
@laurenwalker, @csjx what do you think about maybe asking the data team if we can deploy this on test.arcticdata.io for maybe a week or two before we merge to release just to give us more confidence this isn't breaking things?
We could deploy to my VM fancy-vulture, pointing to test.arcticdata, since the data team doesn't use that
Well, my thinking was that the data team would help us find remaining bugs.
Oh, I see what you're saying. Yeah let's run it by Jeanette
I tested this a bit today and I didn't run into any major issues! I found one bbug and have two thoughts:
In provenance charts: when I click the "View" button in a provenance node popover, the window jumps down to the inner-page anchor for that object (expected), but then it redirects to a /view
route for that pid. It should only jump to the anchor.
Example: Dataset https://test.arcticdata.io/view/urn%3Auuid%3A66ad360a-30c6-4daa-984b-61910f245fde
Click on the derivation node for Data4.csv
and click the "View" button for Datat5.csv
Should we encode ORCIDs and DNs in /profile routes? I would think we should.
I think it would be a good idea to URL-encode the identifier in the browser location when you go to a /view URL with an unencoded pid. That way, it looks like a redirect. And if the user copies the URL from their browser, they will have the new encoded URL.
Hey @laurenwalker , thanks for testing. That bug may have been introduced when I fixed an unrelated bug in the prov chart view in 1d88b3703c5058af6192f596050936a974b74ecf. It must have worked fine before because the link was broken and fixing it broke the behavior. Looks like the click handler needs a preventdefault or we need to convert this to using HTML anchors properly. I'll look into that.
Re: Thoughts, I stuck with just encoding object identifiers to keep on task here and made notes as I went of places in the application where I wanted to discuss other encoding changes. I totally think we should follow this encoding/decoding pattern application wide. Places that popped out were:
"MetacatUI.appModel.get("accountsUrl")
or similarsuiteId
and also follow an old URL structure.
MetacatUI.uiRouter.navigate("quality/s=" + suiteId + "/" + encodeURIComponent(this.pid), {trigger: false});
label
and portalSection
. Is there a chance either or both could need encoding?
MetacatUI.appModel.get("portalTermPlural") + "/" + label + "/" + portalSection);
What do you think about adding that into this work? I think it'd be feasible.
FYI: I just pushed up a bug fix for the ProvChartView bug I discovered/introduced: 4450b0331d247bf34f02d48c05b617862a291367. Now I get nice scrolling to other entities.
Hey Bryce, I was reviewing this branch today and it looks good, and since we've tested on test.arcticdata.io for a while, I think it's ready to merge into the dev branch, so I'll do that today.
For these items:
References to the accounts service. Should these be properly encoded? e.g., anywhere we call "MetacatUI.appModel.get("accountsUrl") or similar e.g., https://github.com/NCEAS/metacatui/blob/dev-2.11-encoding/src/js/views/GroupListView.js#L65:L65 MDQ portions of the codebase doesn't encoding/decode things like suiteId and also follow an old URL structure. e.g., MetacatUI.uiRouter.navigate("quality/s=" + suiteId + "/" + encodeURIComponent(this.pid), {trigger: false}); Portal code doesn't encode things like label and portalSection. Is there a chance either or both could need encoding? e.g., MetacatUI.appModel.get("portalTermPlural") + "/" + label + "/" + portalSection);
Yes, I think we should do all of those. But they don't need to be completed at the same time as this work, so I will create a new ticket for each and we can work them in after we finish up the EML 2.2.0 editing.
I just wanted to drop a note in here in case we come back to encoding/decoding issues again. Above I commented that it seemed like you can detect whether part of a URL is encoded or not by comparing its once-decoded and twice-decoded values. After a good discussion today and a more careful re-read of IETF RFC 3986, I saw this:
Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string. -- https://tools.ietf.org/html/rfc3986#section-2.1
This is relevant to us because our identifiers can include what appears to a percent-encoded character. Earlier on, the RFC talks about a URI being "produced", at which point percent-encoding of reserved characters is assumed. And that clients handling URIs should always be decoding. This is all to say that I now think better of my idea.
Jeanette pointed out an issue rendering a particular package that has curly braces in the identifier:
https://search.dataone.org/#view/{6A218D8B-05CD-49EA-AC13-FD0372B3B1D4}
When the pid is properly encoded, the following does not render:
https://search.dataone.org/#view/%7B6A218D8B-05CD-49EA-AC13-FD0372B3B1D4%7D
Dave has pointed out that we should be handling identifiers uniformly across all IRIs, and that the internal routing within MetacatUI should do so as well. When interacting with the DataONE API, we do in fact call
encodeURIComponent()
, but when a route like#view
receives an identifier, we don't calldecodeURIComponent()
, but rather we assume the string is the exact identifier.Change the routing within MetacatUI to always assume identifiers are encoded in URL contexts by first calling
decodeURIComponent()
for all routes being passed identifiers before processing the call. When we move to using HTML5 pushState, assume identifiers are encoded as well.