Closed brescz closed 4 years ago
Step 1: Download Python (I would recommend version 3.7) from https://www.python.org/downloads/windows/
Step 2: Download the zip
Step 3: Extract the zip folder
Step 4: Open command prompt and cd into that folder
Step 5: Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt python main.py
Note: last command which is python main.py will take time to execute since we need to download all 408 files.
@ According to vgates.
Downloaded Python 3.7.0 (64-bit for win32) Downloaded zip from Author's GitHub Extracted zipfile folder to downloads, it became ("springer_free_books-master") as a folder. Opened command prompt inside of the extracted folder. (I dont know what "cd" is) but opened "main.py" - did not do anything with it. Executed the following commands one by one in Python's command prompt: Python 3.7.0 ....
python -m venv .venv File "
", line 1 python -m venv .venv ^ Syntax Error: invalid syntax .venv\Scripts\activate.bat File " ", line 1 .venv\Scripts\activate.bat ^ SyntaxError: invalid syntax pip install -r requirements.txt File " ", line 1 pip install -r requirements.txt ^ SyntaxError: invalid syntax python main.py File " ", line 1 python main.py ^ SyntaxError: invalid syntax _
If you can elaborate more on Steps 4 and 5 I think that would probably fix the issue, I understand what the command prompt is, no idea what "cd" is or is that just "Python IDLE" (Python 3.7.0 Shell)? I also do not know if you are required to have the .xlsx file from Reddit: https://www.reddit.com/r/learnmachinelearning/comments/fvncjm/springer_is_giving_free_access_to_409_of_its/ I downloaded that excel file, renamed it to just "Springer" and placed that inside in the extracted folder (thinking it would somehow help if that is where it is being downloaded from right?) I then just left the folder completely untouched without editting anything. Here is the main.py script though:
import os import requests import shutil import pandas as pd from tqdm import tqdm
folder = os.path.join(os.getcwd(), 'downloads')
if not os.path.exists(folder): os.mkdir(folder)
if not os.path.exists(os.path.join(folder, "table.xlsx")): books = pd.read_excel('https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4')
# save table:
books.to_excel(os.path.join(folder, 'table.xlsx'))
else: books = pd.read_excel(os.path.join(folder, 'table.xlsx'), index_col=0, header=0)
print('Download started.')
for url, title, author, pk_name in tqdm(books[['OpenURL', 'Book Title', 'Author', 'English Package Name']].values):
new_folder = os.path.join(folder, pk_name)
if not os.path.exists(new_folder):
os.mkdir(new_folder)
r = requests.get(url)
new_url = r.url
new_url = new_url.replace('/book/','/content/pdf/')
new_url = new_url.replace('%2F','/')
new_url = new_url + '.pdf'
final = new_url.split('/')[-1]
final = title.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + author.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + final
final = final.encode('ascii', 'ignore').decode('ascii')
final = (final[:145] + '.pdf') if len(final) > 145 else final
output_file = os.path.join(new_folder, final)
if not os.path.exists(output_file.encode('utf-8')):
with requests.get(new_url, stream=True) as req:
try:
with open(output_file.encode('utf-8'), 'wb') as out_file:
shutil.copyfileobj(req.raw, out_file)
except OSError:
print("Error: PDF filename appears incorrect.")
#download epub version too if exists
new_url = r.url
new_url = new_url.replace('/book/','/download/epub/')
new_url = new_url.replace('%2F','/')
new_url = new_url + '.epub'
final = new_url.split('/')[-1]
final = title.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + author.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + final
final = final.encode('ascii', 'ignore').decode('ascii')
final = (final[:145] + '.epub') if len(final) > 145 else final
output_file = os.path.join(new_folder, final)
request = requests.get(new_url)
if request.status_code == 200:
with requests.get(new_url, stream=True) as req:
try:
with open(output_file.encode('utf-8'), 'wb') as out_file:
shutil.copyfileobj(req.raw, out_file)
except OSError:
print("Error: EPUB filename appears incorrect.")
print('Download finished.')
the file is File (stdn)"
@brescz I am assuming you are using Windows What I meant by cd into that folder Open Command Prompt. For that Click WindowKey+R, you will get the Run prompt open. In there type cmd and click enter
Most probably the folder will be downloaded to your Downloads. So cd to the extracted folder
Now Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt
After all requirements has been downloaded, execute the following command python main.py
The books will be downloaded to downloads folder insider your extracted folder which is springer_free_books-master
@brescz I am assuming you are using Windows What I meant by cd into that folder Open Command Prompt. For that Click WindowKey+R, you will get the Run prompt open. In there type cmd and click enter
Most probably the folder will be downloaded to your Downloads. So cd to the extracted folder
Now Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt
After all requirements has been downloaded, execute the following command python main.py
The books will be downloaded to downloads folder insider your extracted folder which is springer_free_books-master
@vgates Hi!
First of all thanks for your work and helo.
I am already in that folder, but i have a problem with the second code:
is in spanish but it says "The system cannot find the specified path".
Can you help me with that?
Thanks
Please try again, the code was modified and is now much better.
Sorry I am new to Python and I do not understand how to download the files, could somebody or the author please help me, I would greatly appreciate it.