billgreenwald / Pubmed-Batch-Download

Batch download articles based on PMID (Pubmed ID)
MIT License
110 stars 45 forks source link

Problem when trying to download (PMIDs): 9029852, 19092482,11382209 #2

Closed Zalkya closed 6 years ago

Zalkya commented 7 years ago

This happened to all the three articles i've tried to download, what could be happening?

image

Thanks

billgreenwald commented 7 years ago

What journal is it published in, and could you download it normally from the computer you were on?

calin-qw commented 7 years ago

image

I had the same problem, but was able to download all papers manually. Would be great if it was looked into

rajivr79 commented 7 years ago

Same problem re: **fetching of reprint failed

billgreenwald commented 7 years ago

There seems to be a problem with Mechanize; it won't search any of the sites. I will work on this when I have time, but it wont be for a little bit

zakimolvi commented 7 years ago

Hi @billgreenwald, thanks for taking the time to make this. I'm having the same problem as everyone else for every string of PMIDs I've tried.

Here's an example from when I ran ruby pubmedid2pdf.rb 28238146,28238140,28237713,28236573,28236485,28235723,28235591,28234628,28234487,28234430,28233996,28233941,28233218,28233053,28233000,28232590,28232293,28232288

3432423

billgreenwald commented 7 years ago

Hi everyone,

I was able to fix this issue by using rvm to install ruby 2.2.7, then installing rubygems for 2.2.7 and redownloading the gems for the program. At this point in time, the default gem versions worked, but for reference, they are as follows:

socksify-1.7.1 camping-2.1.532 mechanize-2.7.5.

If anyone needs help with the above steps, let me know. If someone can try this and see if it works on their system too, that would be great. I have tested a few pmids from above and was able to fetch them properly now.

if users confirm that this works, I will update the install script to use this change and these specific versions going forward.

MBTrade15 commented 7 years ago

This is a great tool! I'm running ruby 2.2.7 and the gems you specified but when I enter ~50 PMIDs it will return only 6 PDFs and the rest fail. Do you know why this is?

Thanks in advance! If I can get the % of PDF hits up this will be amazing!

billgreenwald commented 7 years ago

Could you list some of the PMIDs that did not work?

Thanks!

colbyw5 commented 6 years ago

Hi @billgreenwald,

I'm having the same problem as previous users, and I attempted to follow your instructions for the solution but I am not sure I installed the gems correctly (very new ruby user). Could you help me with the steps you noted when you have a moment? This is a great tool, thanks for creating!

Colby

billgreenwald commented 6 years ago

Happy to try to help.

I just tried reinstalling from scratch with my instructions 2 posts back, and they worked for me. To make it more explicit (running for Ubuntu):

First, install RVM (taken from here)

sudo apt-add-repository -y ppa:rael-gc/rvm
sudo apt-get update
sudo apt-get install rvm

Next, install ruby 2.2.7

rvm install 2.2.7

Next, install all 3 gems

gem install socksify
gem install camping
gem install mechanize

Finally, run the script

ruby pubmedid2pdf.rb XXX,YYY,ZZZ...

Can you give me some more information about what you ran for set up, and what particular errors you got?

colbyw5 commented 6 years ago

Hey @billgreenwald

Thanks for getting back to me. I ran the exact same set up in my terminal, and I got 204 of 671 pdfs that I was searching. Many of the ones that failed required a password or subscription, but for some it seemed there was a full text pdf available: 24265411 is available via BMJ, 25619871 is available via wiley, 26970696 is available via elsevier, 25778747 is available via pubmed central.

Here is the code I ran, and the error I received (I had already installed ruby 2.2.7 at this point):

rvm use 2.2.7

gem install socksify gem install camping gem install mechanize ruby pubmedid2pdf.rb 25778747 Trying to fetch 25778747 ** fetching of reprint 25778747 failed

Any help would be greatly appreciated, thanks again!

billgreenwald commented 6 years ago

Sorry for the delayed response! I have been really busy lately, but will look into it as soon as I have time. It seems that some of the crawling methods are not being called properly.

Could you give me a PMID that worked for you as well?

colbyw5 commented 6 years ago

Hey Bill,

No problem at all, I am not in rush at this point. 25325179 is one PMID that worked for me. Thanks again for doing this.

On Feb 1, 2018, at 2:11 PM, Bill Greenwald notifications@github.com wrote:

Sorry for the delayed response! I have been really busy lately, but will look into it as soon as I have time. It seems that some of the crawling methods are not being called properly.

Could you give me a PMID that worked for you as well?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/billgreenwald/Pubmed-Batch-Download/issues/2#issuecomment-362370291, or mute the thread https://github.com/notifications/unsubscribe-auth/APpMd21Der01pkZPZ24Bn-s-6ACA0-kAks5tQgxHgaJpZM4LNgxO.

billgreenwald commented 6 years ago

your email signature posted with that last message from you, including your cell. Not sure if that was intentional, just wanted to let you know its visible in plaintext

colbyw5 commented 6 years ago

Thanks for the heads up!

billgreenwald commented 6 years ago

It looks like the issue is a known problem with Mechanize. I will keep looking into it and see if I can figure out a way to fix it, or if rolling back to an earlier version of ruby may help. The error is apparently well known:

too many connection resets (due to end of file reached - EOFError) after 0 requests on 26040640

I am closing this and opening a new issue as an enhancement moving forward

billgreenwald commented 6 years ago

Hi everyone,

sorry to revive an old thread; mechanize was proving difficult and many people online had similar problems dating 2 years now, so i ported the code to python and cleaned it up to work with how current websites handle their metadata for scientific articles. I have uploaded and changed the github accordingly. I have tested the pdfs in this forum, and have gotten most to work, but have some errors that it would be helpful if you could reproduce, assuming you still care about using the program.

@colbyw5 could you test pmid 25778747. I got an econnreset error, which I can't tell is on my end or their end. The others worked.

@zakimolvi 28233000 can't be obtianed due to the SSL on the host site.

If you have any questions, let me know.