Open rossmounce opened 9 years ago
This is the difference between free and OA. These articles are available free, but not OA (at least as classified by EPMC).
This is the fulltext url portion of the result for Busch et al:
"fullTextUrlList": [
{
"fullTextUrl": [
{
"availability": [
"Free"
],
"availabilityCode": [
"F"
],
"documentStyle": [
"pdf"
],
"site": [
"Europe_PMC"
],
"url": [
"http://europepmc.org/articles/PMC4321246?pdf=render"
]
},
{
"availability": [
"Free"
],
"availabilityCode": [
"F"
],
"documentStyle": [
"html"
],
"site": [
"Europe_PMC"
],
"url": [
"http://europepmc.org/articles/PMC4321246"
]
},
{
"availability": [
"Subscription required"
],
"availabilityCode": [
"S"
],
"documentStyle": [
"doi"
],
"site": [
"DOI"
],
"url": [
"http://dx.doi.org/10.1073/pnas.1412514112"
]
}
]
}
],
At the moment getpapers will only try to get PDF/XML for OA papers, not 'free' ones as the license is unclear on these. We could add a --free
argument?
Even if something is just 'free' doesn't necessarily mean it can't be downloaded (Readcube aside). Clearly the fulltext can be downloaded from the (EPMC) website so I would have thought getpapers should mirror that availability.
--free sounds good to me. Might confuse some though. Can't please everyone I guess
Yes, they can often be downloaded but they are not under an open license, so in many countries they can't be contentmined without permission.
Here's what I'm thinking: we have --free
which, in addition to OA, will attempt to get resources from papers marked free
. When --all
is chosen, rather than just not trying to get PDF/XML as we currently do, we could attempt to download from whatever URLs are available. This way, if a user is at their university or using a VPN, they should be able to get a lot of content.
We could include a warning when --free
or --all
are used that makes it clear users need to check what they are legally allowed to do. I think we should have a guide to legality on contentmine.
Good idea, we may also be downloading hybrid papers.
Note that "free" and "open" are overloaded ("openwashed") so we should define them.
On Fri, Aug 7, 2015 at 10:06 PM, Richard Smith-Unna < notifications@github.com> wrote:
Yes, they can often be downloaded but they are not under an open license, so in many countries they can't be contentmined without permission.
Here's what I'm thinking: we have --free which, in addition to OA, will attempt to get resources from papers marked free. When --all is chosen, rather than just not trying to get PDF/XML as we currently do, we could attempt to download from whatever URLs are available. This way, if a user is at their university or using a VPN, they should be able to get a lot of content.
We could include a warning when --free or --all are used that makes it clear users need to check what they are legally allowed to do. I think we should have a guide to legality on contentmine.
— Reply to this email directly or view it on GitHub https://github.com/ContentMine/getpapers/issues/47#issuecomment-128834188 .
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
@petermr I agree, but I think the best we can do to define them is say that they are classified as such by the source (EPMC/ArXiv/IEEE), and link to their explanation of the terms if they have one.
Very strange. It appears one can't get PNAS fulltext as either PDF or XML via getpapers! Yet, via the EuropePMC website there's clearly a lot of freely available full text articles, with PDF (not so sure about availability of full text XML).
Absolutely zero fulltext downloads appear to be possible for PNAS or Science:
Take Busch et al as the test case: http://europepmc.org/articles/PMC4321246 Clearly available as full text for free for human eyes via EPMC as html & downloadable PDF.