CottageLabs / oacwellcome

OA Compliance Checking for Wellcome Trust
Other
1 stars 1 forks source link

EPMC client library #1

Closed richard-jones closed 9 years ago

richard-jones commented 10 years ago

Add a basic EMPC integration library to magnificent octopus. It will need to support the following operations:

PMCID: http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=PMCID:[PMCID] PMID: http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=EXT_ID:[PMID] DOI:http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=DOI:”[DOI]”

Note that the identifiers used should be assumed to be normalised ( #14 will deal with normalising them in the application later)

If we receive exactly one result for either of these, we will take it as the correct item. If we match successfully with (1) we will record a higher confidence in the accuracy of the identification than if we match with (2).

Note that (1) is in fact an exact substring match, so if the text in quotes appears as a substring in multiple titles, it will return multiple results. Since we will be querying on full titles, this should not matter most of the time, but if it fails we will fall back to (2).

The full-text can be located via the EPMC REST API at: http://www.ebi.ac.uk/europepmc/webservices/rest/[PMCID]/fullTextXML

richard-jones commented 10 years ago

Basic client library is now implemented for all of the above operations. Leaving this open until we have also implemented the workflow in #5 , as that relies on the logic around detection through fuzzy title matching.