alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.64k stars 366 forks source link

Disparity between book counts / total sizes #63

Open codingthat opened 4 years ago

codingthat commented 4 years ago

Hi! Great project!

After getting dependencies installed, the download went without errors. But I'm wondering about the result...

https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&%23038;facet-language=%22En%22&%23038;sortOrder=newestFirst&%23038;showAll=true says there are 473 books, 407 of which are in English.

The readme says there are "409 english books (14 GB, both PDF and EPUB)"

My download directory says there are 409 objects, totaling 6.5 GB. Digging into directories shows both PDFs and EPUBs.

1) Why is the size so much lower than in the readme? 2) If there are 407 books, how does that become only 409 files, when most books seem to be represented by 2 files each (PDF and EPUB)? (I would have expected more like 814 files, unless only a handful of books were available in both formats.)

codingthat commented 4 years ago

The list of books not downloaded appears to consist of these, but I'm not sure what makes them different from the others:

A Beginners Guide to Python 3 Programming A Beginner's Guide to R A Beginner's Guide to Scala, Object Orientation and Functional Programming Abstract Algebra A Concise Guide to Market Research A Course in Rasch Measurement Theory Advanced Guide to Python 3 Programming Advanced Organic Chemistry Advanced Organic Chemistry Advanced Quantum Mechanics A First Introduction to Quantum Physics A Modern Introduction to Probability and Statistics Analysis for Computer Scientists Analytical Corporate Finance Analyzing Qualitative Data with MAXQDA An Anthology of London in Literature, 1558-1914 An Introduction to Biomechanics An Introduction to Soil Mechanics An Introduction to Zooarchaeology Applied Bioinformatics Applied Chemistry Applied Linear Algebra Applied Predictive Modeling A Pythagorean Introduction to Number Theory ArcGIS for Environmental and Water Issues Argumentation Theory: A Pragma-Dialectical Perspective Astronautics Automatic Control with Experiments Bayesian Essentials with R Bioinformatics for Evolutionary Biologists Breast Cancer Brewing Science: A Multidisciplinary Approach Brownian Motion, Martingales, and Stochastic Calculus Building Energy Modeling with OpenStudio Business Ethics - A Philosophical and Behavioral Approach Calculus With Applications Chemical and Bioprocess Engineering Climate Change Science: A Modern Synthesis Clinical Methods in Medical Family Therapy Clinical Neuroanatomy Communication and Bioethics at the End of Life Complex Analysis Concepts, Methods and Practical Applications in Applied Demography Concise Guide to Databases Conferencing and Presentation English for Young Academics Control Engineering Control Engineering: MATLAB Exercises Criminal Justice and Mental Health Customer Relationship Management Data Science and Predictive Analytics Digital Business Models Digital Image Processing Disability and Vocational Rehabilitation in Rural Settings Educational Technology Electronic Commerce 2018 Elementary Mechanics Using Matlab Empathetic Space on Screen Energy and the Wealth of Nations Energy Harvesting and Energy Efficiency Engineering Mechanics 2 Entertainment Science ENZYMES: Catalysis, Kinetics and Mechanisms Epidemiological Research: Terms and Concepts Essentials of Business Analytics Essentials of Food Science Evidence-Based Interventions for Children with Challenging Behavior Evidence-Based Practice in Clinical Social Work Exam Survival Guide: Physical Chemistry Excel Data Analysis Food Chemistry Food Fraud Prevention Foundations of Behavioral Health Foundations of Programming Languages Fraud and Corruption Fundamentals of Clinical Trials Fundamentals of Java Programming Fundamentals of Multimedia Fundamentals of Solid State Engineering Game Theory Global Supply Chain and Operations Management Group Theory Group Theory Applied to Chemistry Guide to Competitive Programming Guide to Computer Network Security Guide to Scientific Computing in C++ Handbook of Biological Confocal Microscopy Handbook of Evolutionary Research in Archaeology International Business Management International Humanitarian Action Internet of Things From Hype to Reality Introduction to Artificial Intelligence Introduction to Data Science Introduction to Deep Learning Introduction to Digital Systems Design Introduction to Embedded Systems Introduction to Formal Philosophy Introduction to General Relativity Introduction to Law Introduction to Logic Circuits & Logic Design with VHDL Introduction to Logic Circuits & Logic Design with VHDL Introduction to Mathematica® for Physicists Introduction to Parallel Computing Introduction to Particle and Astroparticle Physics Introduction to Programming with Fortran Introduction to Statistics and Data Analysis Introductory Computer Forensics Introductory Quantum Mechanics Intuitive Probability and Random Processes using MATLAB® Java in Two Semesters Knowledge Management Lessons on Synthetic Bioarchitectures Linear Algebra and Analytic Geometry for Physical Sciences Logical Foundations of Cyber-Physical Systems Logistics Machine Learning in Medicine - a Complete Overview Managing Media and Digital Organizations Managing Sustainable Business Mapping Global Theatre Histories Market Research Mathematical Logic MATLAB for Psychologists Media and Digital Management Motivation and Action Multimedia Big Data Computing for IoT Applications Nanotechnology: Principles and Practices Neural Networks and Deep Learning New Introduction to Multiple Time Series Analysis Object-Oriented Analysis, Design and Implementation Of Cigarettes, High Heels, and Other Interesting Things Off-Grid Electrical Systems in Developing Countries Optimization of Process Flowsheets through Metaheuristic Techniques Perceptual Organization Perspectives on Elderly Crime and Victimization Pharmaceutical Biotechnology Pharmaceutical Biotechnology Philosophical and Mathematical Logic Philosophy of Race Physical Asset Management Physical Chemistry from a Different Angle Physics from Symmetry Physics of Oscillations and Waves Plant Anatomy Plant Ecology Plant Physiology, Development and Metabolism Policing and Minority Communities Political Social Work Polymer Chemistry Polymer Synthesis: Theory and Practice Practical Electrical Engineering Principles of Quantum Mechanics Probability and Statistics for Computer Science Problems in Classical Electromagnetism Proofs from THE BOOK Psychoeducational Assessment and Report Writing Python For ArcGIS Python Programming Fundamentals Quantitative Methods for the Social Sciences Quantum Mechanics for Pedestrians 1 Quantum Mechanics for Pedestrians 2 Quick Start Guide to Verilog Quick Start Guide to VHDL Real Analysis Recommender Systems Research Methods for Social Justice and Equity in Education Research Methods for the Digital Humanities Scanning Electron Microscopy and X-Ray Microanalysis School Leadership and Educational Change in Singapore Social Justice Theory and Practice for Social Work Social Marketing in Action Social Psychology in Action Spine Surgery Stability and Control of Linear Systems Statics and Mechanics of Structures Strategic Human Resource Management and Employment Relations Strategic Retail Management Structural Dynamics Sustainability Science Systems Programming in Unix/Linux Teaching Medicine and Medical Ethics Using Popular Culture The ASCRS Textbook of Colon and Rectal Surgery The A-Z of the PhD Trajectory The Finite Element Method and Applications in Engineering Using ANSYS® The Finite Volume Method in Computational Fluid Dynamics The Physics of Semiconductors The Psychology of Social Status The Python Workbook Travel Marketing, Tourism Economics and the Airline Product Witnessing Torture

pbowyer commented 4 years ago

I ran the script this morning (checkout commit: 40d528f5fe4e8108aedd73e86338978851efc445) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

I installed it following the readme's venv instructions, then ran it like:

python main.py
codingthat commented 4 years ago

@pbowyer Which version of python? I'm on 3.6.9 here.

codingthat commented 4 years ago

(I wonder why it didn't get all of them, but didn't error out, either. If I start again will it redownload ones that are already complete?)

pbowyer commented 4 years ago

@codingthat

Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32
Artneo16 commented 4 years ago

Hi @pbowyer

Would zou be so kind to explain step by step how to do it? I have an error in CMD when i try to run the code.

Thanks in advance!!

BR

pbowyer commented 4 years ago

Hi @Artneo16

Certainly, here's what I did:

  1. First, check you have a recent version of Python installed. To do this, type python at the command prompt. I'm using Python 3.7.0, and recommend you use that or a newer version. [Hint: to quit the python terminal you've launched, type quit()
  2. Now clone or download this repository
  3. Next I followed the instructions in this repository's README: https://github.com/alexgand/springer_free_books#virtual-environment-on-windows-python-3x. I ran the following commands (run them one at a time):
    python -m venv .venv
    .venv\Scripts\activate.bat
    pip install -r requirements.txt
  4. If there were no errors, you can now download the books:
    python main.py

    This will take some time.

  5. Finally, once the books have downloaded, deactivate the virtual environment:
    .venv\Scripts\deactivate.bat

Good luck!

Artneo16 commented 4 years ago

Thank you very much!

StreetGuru commented 4 years ago

I ran the script this morning (checkout commit: 40d528f) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

So strange - I ran the script yesterday evening (on linux) and got 16.4GB, 757 Files, 21 Folders (edit: 407 ebooks were found on the script).

There are a few ebooks that seem to only be available in pdf format (I've checked Springer website and indeed they only have a pdf link for download).

cgavir29 commented 4 years ago

I ran the script this morning (checkout commit: 40d528f) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

So strange - I ran the script yesterday evening (on linux) and got 16.4GB, 757 Files, 21 Folders (edit: 407 ebooks were found on the script).

There are a few ebooks that seem to only be available in pdf format (I've checked Springer website and indeed they only have a pdf link for download).

I ran it once and got 736 Files, 21 Folders just as @pbowyer. However, I ran it again and got 12 more pretty small .epub files <20k that didn't even open. Maybe something similar happened to you.

If you want to find the smalls ones you can do find . -name "*.epub" -size -20k in the root of the folder where you downloaded them. To delete all results use the flag -delete.

StreetGuru commented 4 years ago

I ran it once and got 736 Files, 21 Folders just as @pbowyer. However, I ran it again and got 12 more pretty small .epub files <20k that didn't even open. Maybe something similar happened to you.

If you want to find the smalls ones you can do find . -name "*.epub" -size -20k in the root of the folder where you downloaded them. To delete all results use the flag -delete.

Don't find any small epub files, seems to have downloaded all there is available. The strange thing is the fact that my download folder has a bigger size and more files than @pbowyer and I thought he had downloaded the entire collection?