biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
119 stars 51 forks source link

Update regular expression pattern for panther.pthcmp to match example identifier #242

Closed dhimmel closed 2 years ago

dhimmel commented 2 years ago

Prefix

panther.pthcmp

New Regular Expression Pattern

^(G|P|U|C|S)\d{5}$

Explanation

The pattern for panther.pthcmp (^G|P|U|C|S\\d{5}$) doesn't match the example P00266:

https://github.com/biopragmatics/bioregistry/blob/e9b76c42dbc5cd0e5552093427b2f0a7bbe3b046/exports/registry/registry.json#L14169-L14182

Side note: I am surprised that patterns that don't match the example are allowed. Would think CI should flag this and suppress the pattern/example fields until this is resolved?

Contributor ORCID

0000-0002-3012-7446

dhimmel commented 2 years ago

http://bioregistry.io/registry/panther.pthcmp

Bad in identifiers.org:

https://github.com/biopragmatics/bioregistry/blob/e9b76c42dbc5cd0e5552093427b2f0a7bbe3b046/src/bioregistry/data/external/miriam/processed.json#L7167

Bad in n2t

https://github.com/biopragmatics/bioregistry/blob/e9b76c42dbc5cd0e5552093427b2f0a7bbe3b046/src/bioregistry/data/external/n2t/processed.json#L4329

Also I see that I reported this issue to identifiers.org over a year and half ago at https://github.com/identifiers-org/identifiers-org.github.io/issues/99#issuecomment-627006826 (totally forgot) and it still hasn't been fixed and probably won't be because the issue was prematurely closed.

cthoyt commented 2 years ago

Is the issue that the parentheses were missing?

dhimmel commented 2 years ago

Is the issue that the parentheses were missing?

Yes exactly, such that before S00266 would match but not P00266.

cthoyt commented 2 years ago

So the good news is there is a CI workflow for checking that all of the examples do conform to the patterns. The bad news was there was an obvious error in the way I implemented it. A fix is issued in #243, and luckily, it shows that the example you point out here is the only one where this was an issue.