Closed proverbs53 closed 4 years ago
Replaced
chunk_size = 1024
file_size = int(req.headers['Content-Length'])
num_bars = file_size // chunk_size
with
chunk_size = 1024
if 'Content-Length' in req.headers:
file_size = int(req.headers['Content-Length'])
num_bars = file_size // chunk_size
else:
print("Warning: missing key 'Content-Length' in request headers; taking default length of 100 for progress bar.")
num_bars = 100
` , but I got security errors when trying to push my local branch (it would be my first time contritbuting).
Just run into that error too, not sure what book it was trying to download at the time.
Traceback (most recent call last):
File "main.py", line 88, in <module>
download_books(books, folder, patches)
File "/usr/home/pokui/code/springer_free_books/helper.py", line 133, in download_books
libunwind: EHHeaderParser::decodeTableEntry: bad fde: CIE ID is not zero
download_book(request, output_file, patch)
File "/usr/home/pokui/code/springer_free_books/helper.py", line 87, in download_book
file_size = int(req.headers['Content-Length'])
File "/home/pokui/.local/lib/python3.7/site-packages/requests/structures.py", line 54, in __getitem__
return self._store[key.lower()][1]
KeyError: 'content-length'
Investigated a little bit further, but the hack will not work. What happens is that in these cases, the book has been split across multiple pdfs (or actually, still seems to be behind the paywall), so the download link won't work. Content-Type in that case is 'text/html;charset=utf-8' instead of 'application/pdf'.
Files impacted are (using new indexing method) 295 "A Beginner's Guide to Scala, Object Orientation and Functional Programming" 331 "Introduction to Programming with Fortran" 388 "Advanced Guide to Python 3 Programming"
Ok, then it means it's a file that is in the process of being removed from the list. The answer would then be to emit an error that the file is no longer available for download.
I included the whole statement in download_book in an if-statement (if req.headers['Content-Type'] == 'application/pdf':
) and got the following warnings:
So those are indeed the only three books that have been revoked.
also getting this error at the following index. initially i thought it was due to a dropped internet/vpn connection but restarting it always results in the same.
This is a straight dump without any filters.
:~/springer_free_books$ python3 main.py
389 titles ready to be downloaded...
Overall Progress: 75%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 293/389 [02:22<00:46, 2.05it/s]
Traceback (most recent call last):
File "main.py", line 88, in
I solved this by adding KeyError
to the errors caught in download_books(books, folder, patches):
function in line 134 of helper.py.
from
except (OSError, IOError) as e:
to:
except (OSError, IOError, KeyError) as e:
This way when the KeyError is encountered it is caught and I get `Overall Progress: 85%|█████████████████████████████████████████████▋ | 329/389 [1:02:36<1:29:43, 89.72s/it]'content-length'
and the download continues with the next book
I retrieved the latest code version, and it contains this line that ruins the exception catch file_size = int(req.headers['Content-Length']) if req.headers.get('Content-Length') else 30000
.
After removing that, it goes on like planned. Many kudos both for this solution, of just catching the exception, and adding the retry when any exceptions occur.
In some occassions (eg. in my case indices 295 and 331), the script abandons with "KeyError: 'content-length'". I printed the request.headers and it appears that specific key was not present.
Full stack trace Traceback (most recent call last): File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\main.py", line 88, in
download_books(books, folder, patches)
File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\helper.py", line 137, in download_books
download_book(request, output_file, patch)
File "C:\Users...\OneDrive\Documenten\7. Source\Repos\springer_free_books\helper.py", line 91, in download_book
file_size = int(req.headers['Content-Length'])
File "C:\Users...\AppData\Roaming\Python\Python37\site-packages\requests\structures.py", line 52, in getitem
return self._store[key.lower()][1]
KeyError: 'content-length'