Closed jeet-vora closed 7 months ago
Hi @jeet-vora, thanks for the PR. I've reorganized a bit to denote that you are the "contact". The "contributor" field points to me since I am the one that added this record to the Bioregistry, not that I am the owner of the record. This is useful to keep track of as it makes it directly possible to get in touch and ask people why they did things the way they did.
Further, I switched the email you wrote for your direct email. It is Bioregistry policy to use single point of contact, and explicitly not group emails.
With respect to the other part you asked about "Missing LUI pattern" - the question is, can you come up with a regular expression that we can use to validate a given GlyGen local unique identifier like G24361QY
? Is it always, for example, a letter, then some numbers, then two more letters?
Hi @jeet-vora, thanks for the PR. I've reorganized a bit to denote that you are the "contact".
Thanks
The "contributor" field points to me since I am the one that added this record to the Bioregistry, not that I am the owner of the record. This is useful to keep track of as it makes it directly possible to get in touch and ask people why they did things the way they did.
Sounds good
Further, I switched the email you wrote for your direct email. It is Bioregistry policy to use single point of contact, and explicitly not group emails.
Not an issue. Generally, we prefer to use GlyGen email.
With respect to the other part you asked about "Missing LUI pattern" - the question is, can you come up with a regular expression that we can use to validate a given GlyGen local unique identifier like G24361QY? Is it always, for example, a letter, then some numbers, then two more letters?
So we have two primary identifiers in GlyGen, one for
protein
which is UniProtKB ac (P14210) and the other isglycan
which is the GlyTouCan identifier (G17689DH).The patterns are: Protein/UniProtKB ac:^([A-N,R-Z][0-9]([A-Z][A-Z, 0-9][A-Z, 0-9][0-9]){1,2})|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(.\d+)?$
Glycan/GlyTouCan: ^G[0-9]{5}[A-Z]{2}$
Can the above be used?
in general, the Bioregistry discourages the titles for prefixes from being written this way. What does GlyGen stand for?
Our resource name is GlyGen and not Computational and Informatics Resources for Glycoscience. GlyGen is not shown when GlyGen is searched in the registry. We are fine removing Computational and Informatics Resources for Glycoscience or it can be GlyGen Computational and Informatics Resources for Glycoscience, but it will be too long.
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
1366cfc
) 40.82% compared to head (298c555
) 40.82%. Report is 21 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks @jeet-vora, I incorporated your suggestions. Note that UniProt is a fully distinct semantic space and therefore does not get included in the glygen
prefix, even if the corresponding GlyGen website can resolve UniProt IDs.
@cthoyt Thanks a lot for helping us update the info.
Includes GlyGen edits