evmer / perlego-downloader

Download books from Perlego.com in PDF format
MIT License
118 stars 55 forks source link

Error : Too many files open #31

Open cz-001 opened 1 year ago

cz-001 commented 1 year ago

Firstly, a good piece of code and a lot of learning for us... the novices. Secondly, the code breaks down when the number of pages > 1000 (approx) with an error : Too many files open I've reduced number of threads from 50 to 20.

PUPPETEER_THREADS = 20

Still facing the same issue Thanx

evmer commented 1 year ago

Hey, try to reduce it to 1

bleedingsaber commented 1 year ago

I reduced it to 1, about 450 pages still failed, saying "too many files open". Is that my computer too old?

caseypaite commented 1 year ago

set the limit on number of open files using ulimit -n 10240 . Try with higher number than 1024 if your document contains more pages.

evmer commented 1 year ago

Thank you @caseypaite!

@cz-001 / @bleedingsaber can you please confirm that setting the proper ulimit fixed your issue?

cz-001 commented 1 year ago

Thank you @caseypaite!

@cz-001 / @bleedingsaber can you please confirm that setting the proper ulimit fixed your issue?

Hello I tried the above mentioned methods. But now I am getting a new error

" {'event': 'error', 'data': {'message': 'Failed to validate recaptcha token', 'code': 6}} "

probably generated by

ws.send(json.dumps({"action":"initialise","data":{"authToken": AUTH_TOKEN, "reCaptchaToken": AUTH_TOKEN, "bookId": str(BOOK_ID)}}))

what could be the reason ? Could over trying be a reason wherein my token has been banned by the website. I tried logging in and out multiple times but still getting same error. Tried to give a break of 10 days and then tried again ... still no luck. What could be the reason

Thanx @evmer

cz-001 commented 1 year ago

hi @evmer any luck or guidance on the above error please

muttmutt commented 1 year ago

@cz-001

What OS are you running? I'm on MacOS 13.1 and there's a whole process to update the ulimit and kernel file limit that involved disabling the OS protection, making the changes, testing and then re-enabling OS protection.

caseypaite commented 1 year ago

@cz-001

What OS are you running? I'm on MacOS 13.1 and there's a whole process to update the ulimit and kernel file limit that involved disabling the OS protection, making the changes, testing and then re-enabling OS protection.

Executing the command i suggested above would set the limit for the terminal session only, system settings are not altered. No complex processing required at all. @evmer this issue arises at the final stage of merging all the pdf pages to a single file. If you can modify the Python code to first combine in groups of upto 500 pages and then generate a final document this issue could be bypassed/resolved without requiring the user to change open file limit on their system.

muttmutt commented 1 year ago

@evmer this issue arises at the final stage of merging all the pdf pages to a single file. If you can modify the Python code to first combine in groups of upto 500 pages and then generate a final document this issue could be bypassed/resolved without requiring the user to change open file limit on their system.

That's a good suggestion. But if you're on MacOS, you might want to look at this blog post:

https://www.macobserver.com/tips/deep-dive/evade-macos-many-open-files-error-pushing-limits/

cz-001 commented 1 year ago

@cz-001

What OS are you running? I'm on MacOS 13.1 and there's a whole process to update the ulimit and kernel file limit that involved disabling the OS protection, making the changes, testing and then re-enabling OS protection.

Hi, I am working on linux. I have updated the ulimit but due to the new error mentioned above. I cannot verify. " {'event': 'error', 'data': {'message': 'Failed to validate recaptcha token', 'code': 6}} "

muttmutt commented 1 year ago

@cz-001

What OS are you running? I'm on MacOS 13.1 and there's a whole process to update the ulimit and kernel file limit that involved disabling the OS protection, making the changes, testing and then re-enabling OS protection.

Hi, I am working on linux. I have updated the ulimit but due to the new error mentioned above. I cannot verify. " {'event': 'error', 'data': {'message': 'Failed to validate recaptcha token', 'code': 6}} "

So did you watch the video on how to gather the necessary data from the session? And then you poked that data into downloader.py before trying to download the book?