alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.64k stars 366 forks source link

KeyError: 'content-length' #98

Closed proverbs53 closed 4 years ago

proverbs53 commented 4 years ago

In some occassions (eg. in my case indices 295 and 331), the script abandons with "KeyError: 'content-length'". I printed the request.headers and it appears that specific key was not present.

Full stack trace Traceback (most recent call last): File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\main.py", line 88, in download_books(books, folder, patches) File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\helper.py", line 137, in download_books download_book(request, output_file, patch) File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\helper.py", line 91, in download_book file_size = int(req.headers['Content-Length']) File "C:\Users...\AppData\Roaming\Python\Python37\site-packages\requests\structures.py", line 52, in getitem return self._store[key.lower()][1] KeyError: 'content-length'

proverbs53 commented 4 years ago

Replaced

chunk_size = 1024

file_size = int(req.headers['Content-Length'])

num_bars = file_size // chunk_size

with

            chunk_size = 1024

            if 'Content-Length' in req.headers:

                file_size = int(req.headers['Content-Length'])

                num_bars = file_size // chunk_size

            else:

                print("Warning: missing key 'Content-Length' in request headers; taking default length of 100 for progress bar.")

                num_bars = 100

` , but I got security errors when trying to push my local branch (it would be my first time contritbuting).

pokui commented 4 years ago

Just run into that error too, not sure what book it was trying to download at the time.

Traceback (most recent call last):
  File "main.py", line 88, in <module>
    download_books(books, folder, patches)
  File "/usr/home/pokui/code/springer_free_books/helper.py", line 133, in download_books
libunwind: EHHeaderParser::decodeTableEntry: bad fde: CIE ID is not zero
    download_book(request, output_file, patch)
  File "/usr/home/pokui/code/springer_free_books/helper.py", line 87, in download_book
    file_size = int(req.headers['Content-Length'])
  File "/home/pokui/.local/lib/python3.7/site-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-length'
proverbs53 commented 4 years ago

Investigated a little bit further, but the hack will not work. What happens is that in these cases, the book has been split across multiple pdfs (or actually, still seems to be behind the paywall), so the download link won't work. Content-Type in that case is 'text/html;charset=utf-8' instead of 'application/pdf'.

Files impacted are (using new indexing method) 295 "A Beginner's Guide to Scala, Object Orientation and Functional Programming" 331 "Introduction to Programming with Fortran" 388 "Advanced Guide to Python 3 Programming"

pokui commented 4 years ago

Ok, then it means it's a file that is in the process of being removed from the list. The answer would then be to emit an error that the file is no longer available for download.

proverbs53 commented 4 years ago

I included the whole statement in download_book in an if-statement (if req.headers['Content-Type'] == 'application/pdf':) and got the following warnings:

So those are indeed the only three books that have been revoked.

idlogin commented 4 years ago

also getting this error at the following index. initially i thought it was due to a dropped internet/vpn connection but restarting it always results in the same.

This is a straight dump without any filters.

:~/springer_free_books$ python3 main.py

389 titles ready to be downloaded... Overall Progress: 75%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 293/389 [02:22<00:46, 2.05it/s] Traceback (most recent call last): File "main.py", line 88, in download_books(books, folder, patches) File "/home/uduo/springer_free_books/helper.py", line 133, in download_books download_book(request, output_file, patch) File "/home/uduo/springer_free_books/helper.py", line 87, in download_book file_size = int(req.headers['Content-Length']) File "/home/uduo/springer_free_books/.venv/lib/python3.6/site-packages/requests/structures.py", line 54, in getitem return self._store[key.lower()][1] KeyError: 'content-length'

astrodextro commented 4 years ago

I solved this by adding KeyError to the errors caught in download_books(books, folder, patches): function in line 134 of helper.py.

from except (OSError, IOError) as e:

to: except (OSError, IOError, KeyError) as e:

This way when the KeyError is encountered it is caught and I get `Overall Progress: 85%|█████████████████████████████████████████████▋ | 329/389 [1:02:36<1:29:43, 89.72s/it]'content-length'

and the download continues with the next book

proverbs53 commented 4 years ago

I retrieved the latest code version, and it contains this line that ruins the exception catch file_size = int(req.headers['Content-Length']) if req.headers.get('Content-Length') else 30000.

After removing that, it goes on like planned. Many kudos both for this solution, of just catching the exception, and adding the retry when any exceptions occur.