biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
114 stars 49 forks source link

Update PubMed Central #965

Closed sierra-moxon closed 10 months ago

sierra-moxon commented 10 months ago

Prefix

pmc

Explanation

I am confused about how to interpret the PMC/pmc/PMCID prefix. It looks like it can expand to two different resources? Pubmed Central at NCBI and European PMC, and these two resources actually use different local identifier regex patterns to resolve correctly. e.g.:

(On the bioregistry page, this URL actually is written so that the regex works for expansion: https://europepmc.org/article/PMC3084216 is redirected to https://europepmc.org/article/PMC/3084216 but its not clear to me from the metadata on the pmc prefix that this is happening). Said another way, this URL https://europepmc.org/article/PMC3084216 doesn't resolve?

Contributor ORCID

0000-0002-8719-7760

Blocked By

cthoyt commented 10 months ago

It appears that https://europepmc.org/articles/PMC3084216 (note the plural on "article") redirects to https://europepmc.org/article/PMC/3084216, so the Europe PMC site also maintains compatibility with how it's annotated now, including the PMC string in the local unique identifier.

I agree that this is a very confusing type of namespace in LUID / banana situation. If we wanted to say that the PMC identifier should actually only be the number, then we would have to use banana normalization logic, which goes beyond the standard prefix normalization that e.g., the curies package handles.

Maybe again we should seek out a responsible person from the PMC to try and get an authoritative look at this

jeffbeckncbi commented 10 months ago

Thanks to Charlie for looping us in on this discussion. I am the Program Head for Literature at NCBI - the group that runs PubMed and PMC at the US Library of Medicine.

PMC uses the accession id format with the "PMC" prefix. There can also be a version number suffix. So the pattern is "PMC"{Integer}.{version}. An accession without a specific version number should resolve to the highest available or "current" version of the article.

Compare https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418728 and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418728.1 and even https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418728,2

To your question, "PMC" is an integral part of the PMC Accession ID, so it should be included. To create a CURIE with the pmc prefix, there will be an apparent redundancy

pmc:PMC7418728 pmc:PMC7418728.1

Europe PMC mirrors most of the content from PMC, and the identifiers are managed by PMC at NCBI/NLM. So the AccessionIds are the same for an article in PMC or when it is distributed through EuropePMC.

cthoyt commented 10 months ago

@jeffbeckncbi this is great, thank you for getting in touch. Do you think you could share your ORCID and email address such that we can mark you as the contact for both the PMC and PubMed records?

jeffbeckncbi commented 10 months ago

Sure. The email address associated with this github account is beck@ncbi.nlm.nih.gov, but it is much easier to use jbeck@nih.gov.

... looking up my ORCID ID Identifier ...

https://orcid.org/0000-0002-1798-9797

Jeff

cthoyt commented 10 months ago

@sierra-moxon it looks like the resolution to this confusion is that the Europe PMC site makes a implementation decision to store its PMC identifiers in a non-standard form (i.e., as an integer). It therefore redirects from https://europepmc.org/articles/PMC3084216 to https://europepmc.org/article/PMC/3084216 (note articles in the first vs article in the second). It doesn't appear that full PMC's work in the article endpoint so https://europepmc.org/article/PMC3084216 does not work. This appears to be internally consistent.

sierra-moxon commented 10 months ago

@cthoyt @jeffbeckncbi - can you confirm that the URI expansion should be to European PMC instead of NCBI in bioregistry? It does not appear I can reopen this ticket.

cthoyt commented 10 months ago

Sorry, I must have overlooked if this was part of the earlier issue. I don't think any of these are RDF-ready URI formats, right?

sierra-moxon commented 10 months ago

Also I would like confirmation that "PMC" should be the "preferred" prefix for these identifiers? Or is "pmc" preferred? thanks!

sierra-moxon commented 10 months ago

@jeffbeckncbi - can you also comment on: https://github.com/biopragmatics/bioregistry/issues/323 ? Issue 323 deals with the prefix for PubMed IDs. (PMID vs. pubmed vs. PUBMED). thanks again!

jeffbeckncbi commented 10 months ago

Also I would like confirmation that "PMC" should be the "preferred" prefix for these identifiers? Or is "pmc" preferred? thanks!

I'm getting confused about what you mean by prefix. "PMC" is part of the Accession ID for PubMed Central Articles. "pmc" should be used for the scheme in the URI.

cthoyt commented 10 months ago

@jeffbeckncbi the question is, which is better when using the CURIE syntax:

  1. pmc:PMC7418728
  2. PMC:PMC7418728