MiniGlome / Archive.org-Downloader

Python3 script to download archive.org books in PDF format
865 stars 116 forks source link

Downloads book but all pages are blank saying preview not available #89

Closed TheRabidWolverine closed 1 year ago

TheRabidWolverine commented 1 year ago

Using the script, downloads the book fine, but for all books, even though many of them are not free on IA but needs account and borrowing to read, the message reads "No need to borrow", basically same message as it gives for an otherwise free to read book. The first 3-4 pages are normal in the PDF, all subsequent pages are blank like one sees on the website while seeking to a page number in a not-free book without borrowing. Similar issue on the website as well, doesn't allow read access on any book, every book gives message "Borrowing not available, please try again later" upon trying to borrow it. Can it be that the account is blocked or shadowbanned or something because it is trying to access many books serially?

image image
darnn commented 1 year ago

I was able to borrow the book in your screenshot, so... maybe?

TheRabidWolverine commented 1 year ago

I was able to borrow the book in your screenshot, so... maybe?

Kind of defeats the purpose if books cannot be downloaded serially. :( Any workaround you can think of?

darnn commented 1 year ago

Register a different username and use a VPN, and don't download more than, I don't know, five an hour?

TheRabidWolverine commented 1 year ago

Register a different username and use a VPN, and don't download more than, I don't know, five an hour?

Where will the VPN code go in? I mean, within the code? I can see one time login at the beginning of the code, which is used for all subsequent requests to files mentioned in the file argument. Won't it store cookies or something?

image

Secondly, not all books have the "loan" attributes present in their metadata, so how do I detect if the book is showing a message of not needing borrowing whereas actually it should, and thus I can catch that as a flag and may be stop using that as credential for now?

image
darnn commented 1 year ago

Oh, I didn't mean in the code, I meant just have a VPN active, so that your connection would be going through it when you're using the script. Also, if I recall correctly, you should currently be able to use the script with books that don't need to be borrowed either, and I find it simpler to just grab a bunch of URLs without having to check each one to determine if it needs to be borrowed or not.

darnn commented 1 year ago

Also, bear in mind that I'm just some guy, and you might get more authoritative answers from the person who actually made this.

TheRabidWolverine commented 1 year ago

Oh, I didn't mean in the code, I meant just have a VPN active, so that your connection would be going through it when you're using the script. Also, if I recall correctly, you should currently be able to use the script with books that don't need to be borrowed either, and I find it simpler to just grab a bunch of URLs without having to check each one to determine if it needs to be borrowed or not.

Hmm I don't think just using VPN will help, it might notice the same account being used to download multiple books, just being rerouted through different VPNs could have helped only if no account information was needed. The account info tethers the request to one account, which gets blacklisted, irrespective of which IP it used to download many books. Only option seems to be using a series of different credentials with VPN proxy routing (so that each call routes through a different IP address, although I doubt it will assist much) and cache being cleared requiring new login session for every book. Seems very clumsy, but cannot think of anything else. :(

darnn commented 1 year ago

Oh yeah, I don't mean that a VPN will conceal the fact that it's the same account, just that if it's restricting your IP, not only your current account, the VPN would be necessary to begin with. As for downloading multiple books, like I said, I think if you don't exceed a reasonable amount an hour, you should be good. I've had days when I downloaded, I don't know, 20 books and was fine.

TheRabidWolverine commented 1 year ago

Oh yeah, I don't mean that a VPN will conceal the fact that it's the same account, just that if it's restricting your IP, not only your current account, the VPN would be necessary to begin with. As for downloading multiple books, like I said, I think if you don't exceed a reasonable amount an hour, you should be good. I've had days when I downloaded, I don't know, 20 books and was fine.

No I don't think IP has anything to do with it, the machine where the script is running and throwing the error, and the machine where I am trying to borrow the book online in the IA web portal are different, with different IPs, yet both fail with the same issue regarding lending (sending standard "page not available" error or refusing to even allow borrowing).

MiniGlome commented 1 year ago

The book https://archive.org/details/0815inderkaserne0000kirs is downloaded correctly on my computer. I don't think there is a rate limit on this website, so the problem probably comes from your account. You should try to create another account. That can be done with a temporary email provider such as https://temp-mail.org