billgreenwald / Pubmed-Batch-Download

Batch download articles based on PMID (Pubmed ID)
MIT License
110 stars 45 forks source link

Add interface for Zotero translators #8

Open dmcalli2 opened 5 years ago

dmcalli2 commented 5 years ago

Hi Bill, I get two other types of error messages for papers I can access if I click through from pubmed. I suspect that the "badstatusline" error may relate to the fact that I am running the queries from within WSL.

Some example papers are 25176136 - an open NEJM paper 26030325 - a PubMedCentral paper 17074775 - a European heart journal paper

I have given an example of each type of error message

Messages follow

Trying to fetch pmid 25176136 Trying genericCitationLabelled Trying pubmed_central Trying acsPublications Trying uchicagoPress Trying science_direct fetching of reprint 25176136 failed from error Invalid URL '': No schema supplied. Perhaps you meant http://? Trying to fetch pmid 26030325 fetching of reprint 26030325 failed from error ('Connection aborted.', BadStatusLine("''",)) Trying to fetch pmid 17074775 ** fetching of reprint 17074775 failed from error ('Connection aborted.', BadStatusLine("''",))

billgreenwald commented 5 years ago

NEJM needed its own fetcher -- I wrote and tested with the one you listed, but let me know if it doesnt work for other NEJM articles you have.

For the other two, they are part of Oxford Academic journals it looks like. I meant to document that I haven't yet found a way to get around either EconReset errors or BadStatusLine errors from Oxford Academic, but its a known problem. From cursory research online, BadStatusLines look to be problems on the journals end, potentially through blocking robots on purpose.

If you have any thoughts let me know, otherwise, you need to grab those yourself. Pushed a new version

billgreenwald commented 5 years ago

Some new google searching yielded a fix, which is great! I am updating code now

billgreenwald commented 5 years ago

Updated to version 2.3.0, and should be fixed

dmcalli2 commented 5 years ago

HI Bill, just to say thanks for doing this, and sorry for the delay. I am away from the office and need direct access to my work network (ie not just VPN) to run the code. I'll let you know how I get on. Only other thought I have had is (more generally) whether it would be possible to make use of the Zotero translators (https://github.com/zotero/translators). It would require calling the javascript code from python once your script has found the journal's page. This is not something I know anything about (as I mostly work in R) - just an idea. Apologies if you have already considered it.

billgreenwald commented 5 years ago

I haven't used Zotero before, but can read up on it. Can you clarify how/what you are thinking/would like it to be used it for?

dmcalli2 commented 5 years ago

I probably don't need this now, as I have downloaded the rest of the papers I need manually, but what I was thinking of was this:-

  1. Pubmed-Batch-Download finds website where paper is held
  2. Zotero translator used to download it Zotero is an open source reference manager tool. It has an SQLite database as a back-end. It has a standalone browser for the database, and an extension inside a web-browser. The extension within the web-browser is used to download metadata, html files and any pdfs associated with an article. Many "translators" have been written for this purpose. I wondered if it would be possible for Pubmed-Batch-Download to run the translator script. Just a thought, as I say, I am probably not going to need this function.
billgreenwald commented 5 years ago

I'm going to mark this as an enhancement for the future.

dmcalli2 commented 5 years ago

Thanks Bill.