ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Wrong paper is downloaded #193

Closed raffaem closed 2 years ago

raffaem commented 2 years ago

I'm trying to download this paper.

To do so I'm using:

getpapers --query "TITLE:'The Impact of pH on Catalytically Critical Protein Conformational Changes: The Case of the Urease, a Nickel Enzyme'" --pdf --limit 1 --outdir . --loglevel verbose

But another paper gets downloaded instead (this one)

petermr commented 2 years ago

If I put your query into the EPMC website search it gives me 22 papers of which 1 has the title you want. (I used https://europepmc.org/search?query=TITLE%3A%27The%20Impact%20of%20pH%20on%20Catalytically%20Critical%20Protein%20Conformational%20Changes%3A%20The%20Case%20of%20the%20Urease,%20a%20Nickel%20Enzyme%27 ) Either the syntax is wrong or the TITLE: keyword doesn't filter this exactly. If you run it without -k I would expect you to get 22 hits (the TITLE keyword doesn't work for me either)

On Tue, Feb 15, 2022 at 10:26 AM raffaem @.***> wrote:

I'm trying to download this paper https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/chem.201902320 .

To do so I'm using:

getpapers --query "TITLE:'The Impact of pH on Catalytically Critical Protein Conformational Changes: The Case of the Urease, a Nickel Enzyme'" --pdf --limit 1 --outdir . --loglevel verbose

But another paper gets downloaded instead (this one https://pubs.rsc.org/en/content/articlehtml/2017/sc/c8np00074c)

— Reply to this email directly, view it on GitHub https://github.com/ContentMine/getpapers/issues/193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3FDIMDHT4SGAFHYITU3IS4NANCNFSM5OOCKC3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 2 years ago

Ahh, I reran it using the advanced search and the syntax should be

(TITLE:"The Impact of pH on Catalytically Critical Protein Conformational Changes: The Case of the Urease, a Nickel Enzyme")


(base) pm286macbook:pyamiimage pm286$ pygetpapers -q '(TITLE:"The Impact of
pH on Catalytically Critical Protein Conformational Changes: The Case of
the Urease, a Nickel Enzyme")'

*INFO:* Total Hits are 1

*WARNING:* Could not find more papers

1it [00:00, 1987.82it/s]

0it [00:00, ?it/s]

Ayush, is this something we can control with managing the quotation punctuation?

On Tue, Feb 15, 2022 at 5:25 PM Peter Murray-Rust < @.***> wrote:

If I put your query into the EPMC website search it gives me 22 papers of which 1 has the title you want. (I used https://europepmc.org/search?query=TITLE%3A%27The%20Impact%20of%20pH%20on%20Catalytically%20Critical%20Protein%20Conformational%20Changes%3A%20The%20Case%20of%20the%20Urease,%20a%20Nickel%20Enzyme%27 ) Either the syntax is wrong or the TITLE: keyword doesn't filter this exactly. If you run it without -k I would expect you to get 22 hits (the TITLE keyword doesn't work for me either)

On Tue, Feb 15, 2022 at 10:26 AM raffaem @.***> wrote:

I'm trying to download this paper https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/chem.201902320 .

To do so I'm using:

getpapers --query "TITLE:'The Impact of pH on Catalytically Critical Protein Conformational Changes: The Case of the Urease, a Nickel Enzyme'" --pdf --limit 1 --outdir . --loglevel verbose

But another paper gets downloaded instead (this one https://pubs.rsc.org/en/content/articlehtml/2017/sc/c8np00074c)

— Reply to this email directly, view it on GitHub https://github.com/ContentMine/getpapers/issues/193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3FDIMDHT4SGAFHYITU3IS4NANCNFSM5OOCKC3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

raffaem commented 2 years ago

@petermr Thanks! How can I download the PDF?

petermr commented 2 years ago
pygetpapers -q '(TITLE:"The Impact of pH on Catalytically Critical Protein
Conformational Changes: The Case of the Urease, a Nickel Enzyme")' --pdf

On Tue, Feb 15, 2022 at 5:33 PM raffaem @.***> wrote:

@petermr https://github.com/petermr Thanks! How can I download the PDF?

— Reply to this email directly, view it on GitHub https://github.com/ContentMine/getpapers/issues/193#issuecomment-1040568837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYQMP3WRADGROAQ6Y3U3KE5PANCNFSM5OOCKC3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

raffaem commented 2 years ago

@petermr appending --pdf does not download the PDF here, only the JSON

petermr commented 2 years ago

Welcome to the wonderful world of subscription publishing

PDF download and online access$59.00

On Tue, Feb 15, 2022 at 5:37 PM raffaem @.***> wrote:

@petermr https://github.com/petermr appending --pdf does not download the PDF here, only the JSON

— Reply to this email directly, view it on GitHub https://github.com/ContentMine/getpapers/issues/193#issuecomment-1040572614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7FJVFYD2OTUGEIT4TU3KFMVANCNFSM5OOCKC3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

raffaem commented 2 years ago

I'm subscribed to the journal.

I can access it either by "institutional login" or by proxy.

Does getpapers support connecting via a proxy?

Also my proxy requires DigestAuth

petermr commented 2 years ago

On Tue, Feb 15, 2022 at 5:46 PM raffaem @.***> wrote:

I'm subscribed to the journal.

I can access it either by "institutional login" or by proxy.

Does getpapers support connecting via a proxy?

No. If we built this it would be a scraper and we would probably get push-back from many journals. The --pdf is for PDFs which have been ingested into the NIH/PMC system.

Note that the quoting must use the exact quotes.

pygetpapers -q "(TITLE:'The Impact of pH on Catalytically Critical Protein Conformational Changes: The Case of the Urease, a Nickel Enzyme')"

does not work.

Also my proxy requires DigestAuth

— Reply to this email directly, view it on GitHub https://github.com/ContentMine/getpapers/issues/193#issuecomment-1040587687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5RO5BTPMNNTJIYYNLU3KGPNANCNFSM5OOCKC3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

raffaem commented 2 years ago

Err after this command:

$ pygetpapers -q '(TITLE:"The Impact of
pH on Catalytically Critical Protein Conformational Changes: The Case of
the Urease, a Nickel Enzyme")' --pdf
INFO: Total Hits are 1
WARNING: Could not find more papers
1it [00:00, 27594.11it/s]
0it [00:00, ?it/s]

The resulting JSON file is empty.

The --pdf is for PDFs which have been ingested into the NIH/PMC system.

Does this mean only open access papers?

Is there a way to retrieve the direct link of the PDF for my paper, so I can download it by myself?