Reducing Jupyter Notebooks' size

TheAlgorithms / Jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

MIT License

798 stars 307 forks source link

Reducing Jupyter Notebooks' size #2

Closed poyea closed 5 years ago

poyea commented 5 years ago

One could also reduce them by translating them into .py.

https://github.com/TheAlgorithms/Python/search?l=jupyter-notebook

cclauss commented 5 years ago

How much size are we reducing? https://github.com/TheAlgorithms/Python/blob/master/other/Food%20wastage%20analysis%20from%201961-2013%20(FAO).ipynb is impressive as an .ipynb file but would loose much of its story telling power as a .py file?

poyea commented 5 years ago

By half? I don't know if this is possible, but some files are making the repository large in size.

cclauss commented 5 years ago

Would it make sense to create a new repo called https://github.com/TheAlgorithms/Jupyter like GitHub does with https://github.com/trending/Python and https://github.com/trending/Jupyter?

chiazor commented 5 years ago

Clearing the outputs created after the execution of a Jupiter Notebook's cell reduces the Notebook's size. The file size can reduce by half.

cclauss commented 5 years ago

Is there an automated way clear those outputs from the command line?

Screenshot 2019-07-21 at 17 43 00

GitHub no longer considers TheAlgorithms/Python to be a Python repo. It is now considered to be a Jupyter repo (by size 65.8% vs. 34.2%) even though the repo currently only has 5 Jupyter files vs. 320 Python files. This means that our repo will no longer be visible on GitHub's trending Python repos which was one of the ways we attracted new contributors. (That is how I found this repo.). My sense is that if the size of the notebooks can not be reduced then we should create an alternative TheAlgorithms/Jupyter-Notebook repo that can take its place on https://github.com/trending/jupyter-notebook

cclauss commented 5 years ago

ls -Fla $NB ; jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace $NB ; ls -Fla $NB

./machine_learning/NaiveBayes                                            50,103 -->  4,217
./machine_learning/reuters_one_vs_rest_classifier                       154,557 --> 13,299 
./machine_learning/random_forest_classification/random_forest_classifier 46,123 -->  4,473
./machine_learning/random_forest_regression/random_forest_regression     16,886 -->  2,319
./neural_network/fully_connected_neural_network                          18,158 -->  5,143

I still think a separate repo for notebooks is more sustainable. Much of the storytelling power is in the output.

chiazor commented 5 years ago

I agree with @cclauss creating a separate repo for notebooks is more suitable.

QuantumNovice commented 5 years ago

ls -Fla $NB ; jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace $NB ; ls -Fla $NB
./machine_learning/NaiveBayes                                            50,103 -->  4,217
./machine_learning/reuters_one_vs_rest_classifier                       154,557 --> 13,299 
./machine_learning/random_forest_classification/random_forest_classifier 46,123 -->  4,473
./machine_learning/random_forest_regression/random_forest_regression     16,886 -->  2,319
./neural_network/fully_connected_neural_network                          18,158 -->  5,143
I still think a separate repo for notebooks is more sustainable. Much of the storytelling power is in the output.

We should create a separate repository with sub-directory for each language supported by Jupyter.

safwaahmad commented 3 months ago

https://github.com/safwaahmad/Data-analysis covers more data in ipnyb file.