alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.65k stars 367 forks source link

HTTP Error 404 Not Found but I was able to manually download spreadsheet #89

Open dld2517 opened 4 years ago

dld2517 commented 4 years ago

~/repos/springer_free_books$ python3 main.py Traceback (most recent call last): File "main.py", line 37, in books = pd.read_excel(table_url) File "/home/ddarden/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 188, in wrapper return func(*args, *kwargs) File "/home/ddarden/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 188, in wrapper return func(args, *kwargs) File "/home/ddarden/.local/lib/python3.8/site-packages/pandas/io/excel.py", line 350, in read_excel io = ExcelFile(io, engine=engine) File "/home/ddarden/.local/lib/python3.8/site-packages/pandas/io/excel.py", line 653, in init self._reader = self._enginesengine File "/home/ddarden/.local/lib/python3.8/site-packages/pandas/io/excel.py", line 402, in init filepath_or_buffer = _urlopen(filepath_or_buffer) File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/usr/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(args) File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

dld2517 commented 4 years ago

Requirement already satisfied: pandas in /home/ddarden/.local/lib/python3.8/site-packages (0.24.2) Requirement already satisfied: python-dateutil>=2.5.0 in /home/ddarden/.local/lib/python3.8/site-packages (from pandas) (2.8.1) Requirement already satisfied: numpy>=1.12.0 in /home/ddarden/.local/lib/python3.8/site-packages (from pandas) (1.16.6) Requirement already satisfied: pytz>=2011k in /home/ddarden/.local/lib/python3.8/site-packages (from pandas) (2019.3) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.5.0->pandas) (1.14.0)

chaosAD commented 4 years ago

Springer had updated the Excel file with a different name, but the Python script tried to download from the old link; therefore the error you encountered. Alex has fixed the link issue (see #85). Try downloading/cloning the repo again.

dld2517 commented 4 years ago

Still didn't work. I used git fetch to redownload it and got the same issue. I think I'm done with the python mess. I just used the spreadsheet and created the url's via a concat function and used wget -O to download them.

chaosAD commented 4 years ago

After git fetch, did you git merge? If you didn't, it wouldn't be in your working directory and therefore you were still running the older script. I suggest git pull command rather than git fetch. But beware that this would work smoothly if you hadn't modified the code in the working directory. In my opinion, the best way is to start off with a clean slate by issuing git clone or download the zip in GitHub. This would have saved you all the trouble.

In fact, I did suggest to you to try downloading/cloning the repo again in my previous post.