MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
13 stars 22 forks source link

incomplete regexp #377

Closed egonw closed 1 year ago

egonw commented 1 year ago

The current regular expression is ^MSBNK-[A-Z0-9_]{1,32}-[A-Z0-9_]{1,64}$ (source) but this does not match MSBNK-Fac_Eng_Univ_Tokyo-JP001576 which has a hyphen in the second block.

I propose updating the regexp to ^MSBNK-[A-Z0-9_]{1,32}-[A-Z0-9_-]{1,64}$, adding the hyphen as allowed in the second block.

schymane commented 1 year ago

The hyphen divides the blocks, with three in total ... it looks like it matches to me?

^MSBNK-[A-Z0-9]{1,32}-**[A-Z0-9]{1,64}$**

and

MSBNK-Fac_Eng_Univ_Tokyo-JP001576

...with bold indicating first and third blocks respectively? The char count seems OK in the second block ...

meier-rene commented 1 year ago

Hi Egon, I can not really follow here. We have exactly two hyphen in every ACCESSION. The hyphen split the blocks MSBNK, the contributor, and a id given by the contributor. There is no hyphen allowed in the third block and I'm pretty sure we don't have any in our data.

egonw commented 1 year ago

Right. Sorry. I made the wrong conclusion indeed. The problem is the regexp fail in Wikidata: https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P6689#%22Format%22_violations

Violations count: 48908

[ethanol (Q153)](https://www.wikidata.org/wiki/Q153): [MSBNK-Fac_Eng_Univ_Tokyo-JP006778](https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-Fac_Eng_Univ_Tokyo-JP006778)
[carbon dioxide (Q1997)](https://www.wikidata.org/wiki/Q1997): [MSBNK-Fac_Eng_Univ_Tokyo-JP001576](https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-Fac_Eng_Univ_Tokyo-JP001576)
[benzene (Q2270)](https://www.wikidata.org/wiki/Q2270): [MSBNK-Fac_Eng_Univ_Tokyo-JP002103](https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-Fac_Eng_Univ_Tokyo-JP002103)
[benzene (Q2270)](https://www.wikidata.org/wiki/Q2270): [MSBNK-Fac_Eng_Univ_Tokyo-JP002347](https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-Fac_Eng_Univ_Tokyo-JP002347)

Right. It's the upper/lower case mismatch then. Agreed?

meier-rene commented 1 year ago

But we have a mistake in the regex in our documantation at a different place. echo MSBNK-Fac_Eng_Univ_Tokyo-JP001576 | grep -E "^MSBNK-[A-Za-z0-9_]{1,32}-[A-Z0-9_]{1,64}$"

So regex should be "^MSBNK-[A-Za-z0-9_]{1,32}-[A-Z0-9_]{1,64}$" and not "^MSBNK-[A-Z0-9_]{1,32}-[A-Z0-9_]{1,64}$" . I will fix that. Thanks for reporting.

meier-rene commented 1 year ago

Thanks for reporting. Its fixed in dev and will go online soon.