bio-tools / biotoolsLint

Utility for verification of bio.tools content with reporting
1 stars 3 forks source link

Unresolvable DOIs (& PMIDs, PMCIDs) #5

Open joncison opened 6 years ago

joncison commented 6 years ago

DOIs can be entered wrong (see https://github.com/bio-tools/biotoolsRegistry/issues/281)

Must check they are resolvable.

Also check for erroneous whitespace (in case this isn't fixed in bio.tools directly)

jaanisoe commented 6 years ago

I attached a list of unresolvable DOIs from the 21st of September: unresolvable_dois.txt. All DOIs are converted to upper-case, sorry about that.

The most common mistake is that there is an extra period (".") at the end of the DOI. Maybe this happens when the DOI is copy-pasted from somewhere where the DOI is at the end of a sentence and selecting the DOI with the cursor selects the sentence ending period along with the DOI?

A few erroneous DOIs are truncated and a few have some other extra characters at the end.

But there are a few unresolvable DOIs, which are probably correct. For example https://doi.org/10.1186/1471-2105-14-S3-S4. The DOI does not resolve, but the article does actually exist: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S3-S4 (on that page, the same unresolvable DOI is listed under the title). This happens for a few other DOIs with registrant code "1186". If the DOI link worked in the past, then this is strange, as DOIs are supposed to be persistent. Anyway, this is probably a mistake on the publisher's side.

joncison commented 6 years ago

Awesome, thanks!

joncison commented 5 years ago

From https://github.com/bio-tools/biotoolsRegistry/issues/281 :

Then, there are IDs which are syntactically correct, but which don't resolve to any publications. Finding these IDs would require actual querying of PubMed/EuropePMC with PMID/PMCID or resolving the DOI.

Non-complete list of such invalid DOIs currently in bio.tools:

doi:10.1093/ nar/gks1219 10.1186/s13742-015-0105-2, 2016. 10.1093/bioinformatics/btr304. 10.1016/j.compbiomed.2014.10.002. 10.1093/bioinformatics/btv189. 10.1002/humu.22503. 10.1093/bioinformatics/btv709. 10.1186/1471-2164-9-75. 10.1016/j.str.2008.10.017. 10.1038/nmeth.2242. 10.1111/j.1749-6632.2008.03756.x. 10.1074/mcp.M900317-MCP200. 10.1021/pr100118f. 10.1136/gutjnl-2011-301104. 10.1186/gb-2009-10-3-r25. 10.1093/bioinformatics/btp352.

Also, there are two invalid such PMIDs:

19504496 3579

joncison commented 5 years ago

see https://github.com/bio-tools/biotoolsLint/issues/24 (using crossref for DOI verification)

joncison commented 5 years ago

see https://github.com/bio-tools/biotoolsLint/issues/15 (update to DOI resolver needed)