alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.65k stars 367 forks source link

HTTP Error 404: Not Found #85

Closed NathanFarmer closed 4 years ago

NathanFarmer commented 4 years ago

The link to the Excel file seems to be broken. When you visit the link in main.py you get: {"projectVersion":"2.245.0-54832206c2d4e90345d71a4427e7542e623e43bf-2020-04-23_08:48:48.0010-local-1","requestUri":"https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4","message":"com.springer.cms.service.ContentNotFoundException: No Content found for Version: 4 with Content-Id: coremedia:///cap/content/17858272","responseCode":404}

goggle commented 4 years ago

I've just installed this script in a virtual environment:

python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python3 main.py

This also gives me a HTTP Error 404:

Traceback (most recent call last):
  File "main.py", line 37, in <module>
    books = pd.read_excel(table_url)
  File "/mnt/harddrive/springer/springer_free_books/venv/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/mnt/harddrive/springer/springer_free_books/venv/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 824, in __init__
    self._reader = self._engines[engine](self._io)
  File "/mnt/harddrive/springer/springer_free_books/venv/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
    super().__init__(filepath_or_buffer)
  File "/mnt/harddrive/springer/springer_free_books/venv/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 342, in __init__
    filepath_or_buffer = BytesIO(urlopen(filepath_or_buffer).read())
  File "/mnt/harddrive/springer/springer_free_books/venv/lib/python3.8/site-packages/pandas/io/common.py", line 141, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
pihalf commented 4 years ago

It's because the API URL changed. Don't know if I have time to do a PR, but simply changing

Line 33 in main.py

table_url = 'https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4'

to

table_url = 'https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v5'

Should do the trick.

NathanFarmer commented 4 years ago

I actually just got rid of the V4 altogether and it seems to be running fine now.

Line 14 in "/app/main.py"
    books = pd.read_excel('https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data')
alexgand commented 4 years ago

Thanks, I updated the code to remove the v4 in the end of the URL.

Now it's working again, with less books than before, though.