NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
42 stars 27 forks source link

Better input validation for RegisterCitationView #1519

Closed laurenwalker closed 3 years ago

laurenwalker commented 4 years ago

I noticed I was able to submit non-doi strings, and I remember seeing a Slack convo with Jasmine and Bryce about submitting dois of various formats and they don’t always work

mbjones commented 4 years ago

I've counted many variants in ways to represent DOIs. They include with and without the doi: prefix, using http and https URIs, including or not including the dx subdomain, all caps versus lower case versus mixed case, with and without query strings and fragment identifiers, and all of the permutations of these. Here's an incomplete list of some of those permutations for the same DOI:

Two helpful things are shared among these representations:

laurenwalker commented 4 years ago

We talked about this during our Arctic Data Center team meeting today. Could we bump this up in priority to the next patch release, @rushirajnenuji? I think it should be pretty straightforward to add a regex check for the DOIs and reformat or reject invalid DOIs.

rushirajnenuji commented 3 years ago

I've worked on a regex that covers all the DOI syntax that Matt has pointed out above. If invalid DOI format, we display an error message, highlight the text box, and disable the register button. @laurenwalker could you please review the warning/error message and its placement in the modal. Please let me know your thoughts. Thank you!

Screen Shot 2020-12-09 at 9.51.23 PM.png

mbjones commented 3 years ago

@rushirajnenuji could you please post the regex you came up with here for us to test with regex tools?

rushirajnenuji commented 3 years ago

Hey Matt -

^(http:\/\/|https:\/\/)?(doi.org\/|dx.doi.org\/)?(doi:|DOI:)?(10[.][0-9]{4,}(?:[.][0-9]+)*(?:(?!["&\'<>])\S)+)$. This covers all the above cases but two (white space character after doi:):

But I'm stripping all the white spaces before testing the identifier string.

mbjones commented 3 years ago

Thanks. Here's a variant on yours:

^\s*(http:\/\/|https:\/\/)?(doi.org\/|dx.doi.org\/)?(doi: ?|DOI: ?)?(10\.\d{4,}(\.\d)*)\/(\w+).*$

Changes include: 1) covers the whitespace after doi: 2) enables leading and trailing whitespace and 3) sets up capture groups to explicitly capture the authority (e.g., 10.5063) and the localname (e.g., F17P8WNT) while excluding URI fragment identifiers and query strings (e.g, ?ver=1&id=3). So calling applications should be able to get the DOI number and localname out easily for processing with back references.

I tested it in this playground which shows a bunch of good matches and strings that shouldn't match and don't: https://regex101.com/r/6kooII/3

rushirajnenuji commented 3 years ago

Thanks, Matt. I've updated the regex in this commit.

laurenwalker commented 3 years ago

@rushirajnenuji - I've reviewed your changes for this issue and have fixed a few issues already, but there are a couple more things I'd like you to take a address:

  1. When you add the validation styling to the elements, you're setting CSS properties on the elements directly. We should very rarely do this. Add a class to the elements instead, and set the CSS in the stylesheets. This keeps the CSS centralized to one location so when we want to make a simple style change in the future, we don't have to hunt down every place in the code where colors and borders are set. https://github.com/NCEAS/metacatui/blob/61bb194c5f29b9b08f6e6bcaa691f35234745b88/src/js/views/RegisterCitationView.js#L175 (It also prevents CSS from overriding the styles you set via JS, without an "!important" directive, which is messy). We already have a CSS class to do this validation styling - look at the other editors in the UI (I think it's called 'error'?)
  2. Can you please ensure all your code from now on is documented with JSDocs. Much of the metrics code doesn't have thorough JSDoc documentation, including the function you added for this validation: https://github.com/NCEAS/metacatui/blob/61bb194c5f29b9b08f6e6bcaa691f35234745b88/src/js/views/RegisterCitationView.js#L159-L161 I updated the MetacatUI doc guidelines the other day to help with this: https://github.com/NCEAS/metacatui/blob/master/CONTRIBUTING.md#code-style

Once these issues are resolved, please merge this feature branch to develop. Thanks!

rushirajnenuji commented 3 years ago

Thanks for the feedback @laurenwalker. I've fixed the CSS styling and added documentation for the methods defined for this ticket. I'll also file a ticket to go through all the code and make sure it is up to our coding style standards. This issue is now complete.

laurenwalker commented 3 years ago

Thanks Rushiraj. The only issue I see with the docs is your @screenshot tag for MetricView: https://github.com/NCEAS/metacatui/blob/183b807e69aee2f139ed95ab21f10486c8efaf48/src/js/views/MetricView.js#L10

I don't see this image in the repository. The @screenshot tag should point to the relative path of a screenshot image of that view, relative to the docs/screenshots directory.

Here's what the view doc page looks like, with the 404 for the screenshot image: Screen Shot 2020-12-22 at 1 27 34 PM

For an example, see the QueryRuleView: https://github.com/NCEAS/metacatui/blob/a328f00edf75636839c752fbbbcf1f770689ae11/src/js/views/queryBuilder/QueryRuleView.js#L29

Screen Shot 2020-12-22 at 1 29 04 PM

rushirajnenuji commented 3 years ago

Ah, I see. I thought this would be some sort of automated process. I have uploaded the screenshot and fixed the path to it. Thank you for testing this out.