ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Biology Direct paper 'had no fulltext HTML url' -- really? #26

Closed rossmounce closed 9 years ago

rossmounce commented 9 years ago

getpapers -q extremophiles --outdir ./extremophiles (again)

many warning lines such as:

warn: Article with pmcid "PMC1586193" had no fulltext HTML url

...so I checked to see what PMC1586193 is and it turns out it's a Biology Direct paper (Rooting the tree of life by transition analyses): http://europepmc.org/articles/PMC1586193

and visually looking at the above EUPMC url in a web browser it looks like EUPMC does have a copy of the full text of the paper. I don't know if this is an issue with getpapers or EUPMC but it seems odd.

Incidentally I got 283 of those warnings. For a search that returns 836 results that's quite a high proportion!

blahah commented 9 years ago

hmm, that paper definitely has a fulltext OA HTML in the results... this is from the JSON output by getpapers:

          {
            "availability": [
              "Open access"
            ],
            "availabilityCode": [
              "OA"
            ],
            "documentStyle": [
              "html"
            ],
            "site": [
              "Europe_PMC"
            ],
            "url": [
              "http://europepmc.org/articles/PMC1586193"
            ]
          },

I will investigate why it's giving that message