biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
177 stars 73 forks source link

Confusion on PMC vs PMCID #1366

Open colleenXu opened 1 year ago

colleenXu commented 1 year ago

I notice a difference between PMC and PMCID, and I'm wondering if this is intentional. I'm also not certain on which to use.

Based on the prefix-map, it looks like:

All of the resources I'm working are providing "PMC"-style IDs that start with "PMC"...so it looks like I should use the prefix PMCID. Is that correct?


Part of my confusion comes from this documentation which shows both PMC and PMCID IDs that don't start with "PMC"...


Side note: do the prefix-maps need changing if URLs are being redirected? I'm noticing that

colleenXu commented 1 year ago

Perhaps @sierra-moxon would be the person to look into this?

colleenXu commented 1 year ago

(as discussed on Monday)

In bioregistry, the two prefixes (PMC, PMCID) are for the same namespace, which has local unique identifiers that start with "PMC" (regex: ^PMC\d+$, ex: PMC3084216).

VS in biolink-model, the two prefixes seem to have different patterns for local unique identifiers:

mbrush commented 1 year ago

I think this may be as simple as fixing the Biomodel prefix registry to add "PMC" to the end of the namespace for expanding the PMCID prefix (i.e. "PMCID": "http://www.ncbi.nlm.nih.gov/pmc/PMC").

If we do this, then the examples in the spec doc resolve, and we be consistent in not requiring anything but the numeric identifier for a pub to follow the prefix (whether it is PMID, PMC, or PMCID).

I created PR #1402 to make this simple change.

@sierra-moxon @colleenXu will this do the trick?

colleenXu commented 1 year ago

@mbrush @sierra-moxon

I'm not sure about doing this. Sierra told me that I should use bioregistry to find the "patterns for local unique identifiers"...so I was under the impression that if we were going to change to 1 pattern, that we'd pick bioregistry's method:

so PMC:PMC1234...

cthoyt commented 1 year ago

In https://github.com/biopragmatics/bioregistry/issues/965, we got authoritative confirmation from the PMC team that PMC local unique identifiers should contain the PMC. Therefore, curies should look like: pmc:PMC1234

gglusman commented 3 months ago

Related to the above, but not the same (or at least, I don't see this specific issue discussed): Is the domain for PMC entities PMC or PMCID? Biolink uses PMC, but all NCBI pages I've seen display PMCID: PMCnnnnnn (as opposed to PMC: PMCnnnnnn). Note I'm not referring to whether to include PMC after the colon, but what to use before it.