alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.64k stars 366 forks source link

New. #40

Closed brescz closed 4 years ago

brescz commented 4 years ago

Sorry I am new to Python and I do not understand how to download the files, could somebody or the author please help me, I would greatly appreciate it.

vgates commented 4 years ago

Step 1: Download Python (I would recommend version 3.7) from https://www.python.org/downloads/windows/

Step 2: Download the zip image

Step 3: Extract the zip folder

Step 4: Open command prompt and cd into that folder

Step 5: Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt python main.py

Note: last command which is python main.py will take time to execute since we need to download all 408 files.

brescz commented 4 years ago

@ According to vgates.

Downloaded Python 3.7.0 (64-bit for win32) Downloaded zip from Author's GitHub Extracted zipfile folder to downloads, it became ("springer_free_books-master") as a folder. Opened command prompt inside of the extracted folder. (I dont know what "cd" is) but opened "main.py" - did not do anything with it. Executed the following commands one by one in Python's command prompt: Python 3.7.0 ....

python -m venv .venv File "", line 1 python -m venv .venv ^ Syntax Error: invalid syntax .venv\Scripts\activate.bat File "", line 1 .venv\Scripts\activate.bat ^ SyntaxError: invalid syntax pip install -r requirements.txt File "", line 1 pip install -r requirements.txt ^ SyntaxError: invalid syntax python main.py File "", line 1 python main.py ^ SyntaxError: invalid syntax _

If you can elaborate more on Steps 4 and 5 I think that would probably fix the issue, I understand what the command prompt is, no idea what "cd" is or is that just "Python IDLE" (Python 3.7.0 Shell)? I also do not know if you are required to have the .xlsx file from Reddit: https://www.reddit.com/r/learnmachinelearning/comments/fvncjm/springer_is_giving_free_access_to_409_of_its/ I downloaded that excel file, renamed it to just "Springer" and placed that inside in the extracted folder (thinking it would somehow help if that is where it is being downloaded from right?) I then just left the folder completely untouched without editting anything. Here is the main.py script though:

!/usr/bin/env python

import os import requests import shutil import pandas as pd from tqdm import tqdm

insert here the folder you want the books to be downloaded:

folder = os.path.join(os.getcwd(), 'downloads')

if not os.path.exists(folder): os.mkdir(folder)

if not os.path.exists(os.path.join(folder, "table.xlsx")): books = pd.read_excel('https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4')

# save table:
books.to_excel(os.path.join(folder, 'table.xlsx'))

else: books = pd.read_excel(os.path.join(folder, 'table.xlsx'), index_col=0, header=0)

print('Download started.')

for url, title, author, pk_name in tqdm(books[['OpenURL', 'Book Title', 'Author', 'English Package Name']].values):

new_folder = os.path.join(folder, pk_name)

if not os.path.exists(new_folder):
    os.mkdir(new_folder)

r = requests.get(url)
new_url = r.url

new_url = new_url.replace('/book/','/content/pdf/')

new_url = new_url.replace('%2F','/')
new_url = new_url + '.pdf'

final = new_url.split('/')[-1]
final = title.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + author.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + final
final = final.encode('ascii', 'ignore').decode('ascii')
final = (final[:145] + '.pdf') if len(final) > 145 else final
output_file = os.path.join(new_folder, final)

if not os.path.exists(output_file.encode('utf-8')):
    with requests.get(new_url, stream=True) as req:
        try:
            with open(output_file.encode('utf-8'), 'wb') as out_file:
                shutil.copyfileobj(req.raw, out_file)
        except OSError:
            print("Error: PDF filename appears incorrect.")

    #download epub version too if exists
    new_url = r.url

    new_url = new_url.replace('/book/','/download/epub/')
    new_url = new_url.replace('%2F','/')
    new_url = new_url + '.epub'

    final = new_url.split('/')[-1]
    final = title.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + author.replace(',','-').replace('.','').replace('/',' ').replace(':',' ') + ' - ' + final
    final = final.encode('ascii', 'ignore').decode('ascii')
    final = (final[:145] + '.epub') if len(final) > 145 else final
    output_file = os.path.join(new_folder, final)

    request = requests.get(new_url)
    if request.status_code == 200:
        with requests.get(new_url, stream=True) as req:
            try:
                with open(output_file.encode('utf-8'), 'wb') as out_file:
                    shutil.copyfileobj(req.raw, out_file)
            except OSError:
                print("Error: EPUB filename appears incorrect.")

print('Download finished.')

brescz commented 4 years ago

the file is File (stdn)"", line 1 for all of them.

vgates commented 4 years ago

@brescz I am assuming you are using Windows What I meant by cd into that folder Open Command Prompt. For that Click WindowKey+R, you will get the Run prompt open. In there type cmd and click enter image

Most probably the folder will be downloaded to your Downloads. So cd to the extracted folder image

Now Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt image

After all requirements has been downloaded, execute the following command python main.py image

The books will be downloaded to downloads folder insider your extracted folder which is springer_free_books-master

Artneo16 commented 4 years ago

@brescz I am assuming you are using Windows What I meant by cd into that folder Open Command Prompt. For that Click WindowKey+R, you will get the Run prompt open. In there type cmd and click enter image

Most probably the folder will be downloaded to your Downloads. So cd to the extracted folder image

Now Execute the following commands one by one python -m venv .venv .venv\Scripts\activate.bat pip install -r requirements.txt image

After all requirements has been downloaded, execute the following command python main.py image

The books will be downloaded to downloads folder insider your extracted folder which is springer_free_books-master

@vgates Hi!

First of all thanks for your work and helo.

I am already in that folder, but i have a problem with the second code:

cmd

is in spanish but it says "The system cannot find the specified path".

Can you help me with that?

Thanks

alexgand commented 4 years ago

Please try again, the code was modified and is now much better.