Closed laurenwalker closed 3 years ago
I've counted many variants in ways to represent DOIs. They include with and without the doi:
prefix, using http
and https
URIs, including or not including the dx
subdomain, all caps versus lower case versus mixed case, with and without query strings and fragment identifiers, and all of the permutations of these. Here's an incomplete list of some of those permutations for the same DOI:
Two helpful things are shared among these representations:
10\.\/?\d+
We talked about this during our Arctic Data Center team meeting today. Could we bump this up in priority to the next patch release, @rushirajnenuji? I think it should be pretty straightforward to add a regex check for the DOIs and reformat or reject invalid DOIs.
I've worked on a regex that covers all the DOI syntax that Matt has pointed out above. If invalid DOI format, we display an error message, highlight the text box, and disable the register button. @laurenwalker could you please review the warning/error message and its placement in the modal. Please let me know your thoughts. Thank you!
@rushirajnenuji could you please post the regex you came up with here for us to test with regex tools?
Hey Matt -
^(http:\/\/|https:\/\/)?(doi.org\/|dx.doi.org\/)?(doi:|DOI:)?(10[.][0-9]{4,}(?:[.][0-9]+)*(?:(?!["&\'<>])\S)+)$
.
This covers all the above cases but two (white space character after doi:
):
But I'm stripping all the white spaces before testing the identifier string.
Thanks. Here's a variant on yours:
^\s*(http:\/\/|https:\/\/)?(doi.org\/|dx.doi.org\/)?(doi: ?|DOI: ?)?(10\.\d{4,}(\.\d)*)\/(\w+).*$
Changes include: 1) covers the whitespace after doi:
2) enables leading and trailing whitespace and 3) sets up capture groups to explicitly capture the authority (e.g., 10.5063) and the localname (e.g., F17P8WNT) while excluding URI fragment identifiers and query strings (e.g, ?ver=1&id=3
). So calling applications should be able to get the DOI number and localname out easily for processing with back references.
I tested it in this playground which shows a bunch of good matches and strings that shouldn't match and don't: https://regex101.com/r/6kooII/3
Thanks, Matt. I've updated the regex in this commit.
@rushirajnenuji - I've reviewed your changes for this issue and have fixed a few issues already, but there are a couple more things I'd like you to take a address:
Once these issues are resolved, please merge this feature branch to develop
. Thanks!
Thanks for the feedback @laurenwalker. I've fixed the CSS styling and added documentation for the methods defined for this ticket. I'll also file a ticket to go through all the code and make sure it is up to our coding style standards. This issue is now complete.
Thanks Rushiraj. The only issue I see with the docs is your @screenshot tag for MetricView: https://github.com/NCEAS/metacatui/blob/183b807e69aee2f139ed95ab21f10486c8efaf48/src/js/views/MetricView.js#L10
I don't see this image in the repository. The @screenshot tag should point to the relative path of a screenshot image of that view, relative to the docs/screenshots directory.
Here's what the view doc page looks like, with the 404 for the screenshot image:
For an example, see the QueryRuleView: https://github.com/NCEAS/metacatui/blob/a328f00edf75636839c752fbbbcf1f770689ae11/src/js/views/queryBuilder/QueryRuleView.js#L29
Ah, I see. I thought this would be some sort of automated process. I have uploaded the screenshot and fixed the path to it. Thank you for testing this out.
I noticed I was able to submit non-doi strings, and I remember seeing a Slack convo with Jasmine and Bryce about submitting dois of various formats and they don’t always work