CottageLabs / oacwellcome

OA Compliance Checking for Wellcome Trust
Other
1 stars 1 forks source link

Add support for EPMC identifiers to OAG #15

Closed richard-jones closed 9 years ago

richard-jones commented 10 years ago

OAG currently supports PMID and DOI identifier types. It should also be made to support EPMC identifiers, and be able to resolve them to the Europe PMC web page, which is accessible at

http://europepmc.org/articles/[PMCID]

for example, these are two valid PMC urls:

http://europepmc.org/articles/PMC4160115 http://europepmc.org/articles/PMC4132119

This should be an identifier plugin, similar to doi.py or pmid.py, and be able to resolve the pmcid to the url provided above (probably sufficient just to craft the URL from a normalised identifier - checking it exists should probably not be necessary, as this will be done when the licence is detected)

emanuil-tolev commented 9 years ago

@richard-jones wondering if you know of a way to validate these? E.g. do they have to have 7 integers?

Currently the logic I've done is:

  1. if it starts with "PMC" and contains only digits afterwards, it's a PMCID
  2. if we've been told it's a PMCID (you can specify what each ID is when you query OAG) and it satisfies 1., then it's a PMCID
  3. if we've been told it's a PMCID and the id contains only digits, then slap "PMC" in front of it and treat it as a PMCID

It may be fine as it is, just checking if you know something additional. The only thing I can see is that maybe PMCID-s always have 7 integers. I have submitted a query to the EuropePMC Helpdesk to check this.

emanuil-tolev commented 9 years ago

I hope we do get an answer from them, otherwise it's a bit hard to validate well without at least a length constraint :).

_rx = r'^(PMC){0,1}[\d]+$'  # Valid: PMC1234567, 1234567, PMC2, 2, 34, 594876985749654
emanuil-tolev commented 9 years ago

EPMC helpdesk response

Thank you. If it contains ‘PMC’ and between 5 and 7 digits it is a PMC ID. Please find below one of the earliest examples and the latest:

http://europepmc.org/articles/PMC61055

http://europepmc.org/articles/PMC4217746

In the Europe PMC Advanced Search you can search by License Type:

In the Basic Search box you could input a search like this:

(TITLE:"adrenal gland surgery") AND (LICENSE:"CC-BY")

emanuil-tolev commented 9 years ago

Yah, it works now. http://howopenisit.org/lookup/PMC2654146 , the PMCID goes in the identifier column, everything works without a hitch. Nice job @richard-jones who originally decoupled which identifier is being used from the provider URL and scraping in OpenArticleGauge.