MiniGlome / Archive.org-Downloader

Python3 script to download archive.org books in PDF format
934 stars 124 forks source link

book downloaded but it's pages are blank #88

Open nahmanides opened 1 year ago

nahmanides commented 1 year ago

I've successfully downloaded my first .pdf and the problem is as stated in the title. what am I missing?

nahmanides commented 1 year ago

I guess it's the encryption thing isn't it?

darnn commented 1 year ago

No, encryption shouldn't have anything to do it, since you're downloading page images. Try it with these parameters: -r 0 -j That way it will bypass the PDF step and give you the images of the pages themselves, and you can turn them into a PDF elsewhere.

nahmanides commented 1 year ago

sorry, I actually looked into a wrong directory -.- my bad we can delete this thread

TheRabidWolverine commented 1 year ago

No, encryption shouldn't have anything to do it, since you're downloading page images. Try it with these parameters: -r 0 -j That way it will bypass the PDF step and give you the images of the pages themselves, and you can turn them into a PDF elsewhere.

Getting the message "no need to borrow book" message for all books, and in all of them, barring the first few pages, other pages are showing as blank with the message "page temporarily unavailable".

MiniGlome commented 1 year ago

No, encryption shouldn't have anything to do it, since you're downloading page images. Try it with these parameters: -r 0 -j That way it will bypass the PDF step and give you the images of the pages themselves, and you can turn them into a PDF elsewhere.

Getting the message "no need to borrow book" message for all books, and in all of them, barring the first few pages, other pages are showing as blank with the message "page temporarily unavailable".

What book are you trying to download? What command line do you use?

TheRabidWolverine commented 1 year ago

No, encryption shouldn't have anything to do it, since you're downloading page images. Try it with these parameters: -r 0 -j That way it will bypass the PDF step and give you the images of the pages themselves, and you can turn them into a PDF elsewhere.

Getting the message "no need to borrow book" message for all books, and in all of them, barring the first few pages, other pages are showing as blank with the message "page temporarily unavailable".

What book are you trying to download? What command line do you use?

Using just normal command to download books. It works fine for some time, then all books start throwing the message "No need to borrow this book" followed by returning preview unavailable page for all pages for all books after that. I think this is IA firewall logic, if the number of downloads from a certain account crosses a threshold, they send default images.

nf24eg commented 1 year ago

same issue here, it downloads few pages and all other are images which temporary not available python archive-org-downloader.py -e myemail@xxx.com -p password -r 0 -u https://archive.org/details/ains23courseguid0000inst also tried to replace -u by -j but not working image image

darnn commented 1 year ago

That book downloads fine for me with C:\[Python39]\python.exe D:\Aodl\archive-org-downloader.py -e email@email.com -p password -r 0 -j -u https://archive.org/details/ains23courseguid0000inst.

nf24eg commented 1 year ago

it downloads for me as well with and without the -j but empty pages, only 10 pages from the book and the others are not preview

nf24eg commented 1 year ago

aha after I used another email it worked and download full book correctly !! I found the same issue in most of the books books I downloaded but all download well after I used another email thank you

nf24eg commented 1 year ago

still some books downloaded with blank pages https://archive.org/details/riskmanagementin0000skip https://archive.org/details/futureofinsuranc0000unse_r1o2 and many more

darnn commented 1 year ago

I mean, it sounds like you need to switch email addresses (and possibly VPNs) if you're downloading large amounts of books one after the other.

nf24eg commented 1 year ago

it works great but some books are so stubborn to be downloaded , even in a single download not multiple. on the other hand, is it possible to get only the book names and pages from the url I have without downloading anything, lets say I have 2 books urls https://archive.org/details/riskmanagementin0000skip https://archive.org/details/futureofinsuranc0000unse_r1o2 can I get those books names and pages numbers only ?! many thanks for that project

MiniGlome commented 1 year ago

The borrowing of this book (https://archive.org/details/riskmanagementin0000skip) is "Unavailable" on the website. This is the first time I've seen this happen, so I guess there is no way to download it or even read it on the website. On the other hand, the script is working properly with this book (https://github.com/MiniGlome/Archive.org-Downloader/issues/url) as it is fully downloaded.

To get the books names (formated) and pages numbers you can add this to the script at line 214

        print(f'title = "{title.replace("_", " ")}" | {len(links)} pages')

The 2 tabs at the beginning of the line are mandatory

nf24eg commented 1 year ago

The borrowing of this book (https://archive.org/details/riskmanagementin0000skip) is "Unavailable" on the website. This is the first time I've seen this happen, so I guess there is no way to download it or even read it on the website. On the other hand, the script is working properly with this book (https://github.com/MiniGlome/Archive.org-Downloader/issues/url) as it is fully downloaded.

To get the books names (formated) and pages numbers you can add this to the script at line 214

      print(f'title = "{title.replace("_", " ")}" | {len(links)} pages')

The 2 tabs at the beginning of the line are mandatory

it's a pleasure that I've showed you something you never saw before :) .. ok now that I'm trying another book (which is available for borrow) but I'm getting the following archive

MiniGlome commented 1 year ago

This has something to do with your local Python installation You are missing the requests dependency You can run

pip install requests

or if you have more "ModuleNotFoundError":

pip install -r requirements.txt
nf24eg commented 1 year ago

I checked everything before I send to you, requests and all the requirements was installed and updated, but after your comment above I uninstalled everything (Python, GIT) and reinstalled it again fresh, and now it is working without error. hopefully to find a solution for those books which is not available for borrow :) many thanks