Closed poyea closed 5 years ago
How much size are we reducing? https://github.com/TheAlgorithms/Python/blob/master/other/Food%20wastage%20analysis%20from%201961-2013%20(FAO).ipynb is impressive as an .ipynb file but would loose much of its story telling power as a .py file?
By half? I don't know if this is possible, but some files are making the repository large in size.
Would it make sense to create a new repo called https://github.com/TheAlgorithms/Jupyter like GitHub does with https://github.com/trending/Python and https://github.com/trending/Jupyter?
Clearing the outputs created after the execution of a Jupiter Notebook's cell reduces the Notebook's size. The file size can reduce by half.
Is there an automated way clear those outputs from the command line?
GitHub no longer considers TheAlgorithms/Python to be a Python repo. It is now considered to be a Jupyter repo (by size 65.8% vs. 34.2%) even though the repo currently only has 5 Jupyter files vs. 320 Python files. This means that our repo will no longer be visible on GitHub's trending Python repos which was one of the ways we attracted new contributors. (That is how I found this repo.). My sense is that if the size of the notebooks can not be reduced then we should create an alternative TheAlgorithms/Jupyter-Notebook repo that can take its place on https://github.com/trending/jupyter-notebook
ls -Fla $NB ; jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace $NB ; ls -Fla $NB
./machine_learning/NaiveBayes 50,103 --> 4,217
./machine_learning/reuters_one_vs_rest_classifier 154,557 --> 13,299
./machine_learning/random_forest_classification/random_forest_classifier 46,123 --> 4,473
./machine_learning/random_forest_regression/random_forest_regression 16,886 --> 2,319
./neural_network/fully_connected_neural_network 18,158 --> 5,143
I still think a separate repo for notebooks is more sustainable. Much of the storytelling power is in the output.
I agree with @cclauss creating a separate repo for notebooks is more suitable.
ls -Fla $NB ; jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace $NB ; ls -Fla $NB
./machine_learning/NaiveBayes 50,103 --> 4,217 ./machine_learning/reuters_one_vs_rest_classifier 154,557 --> 13,299 ./machine_learning/random_forest_classification/random_forest_classifier 46,123 --> 4,473 ./machine_learning/random_forest_regression/random_forest_regression 16,886 --> 2,319 ./neural_network/fully_connected_neural_network 18,158 --> 5,143
I still think a separate repo for notebooks is more sustainable. Much of the storytelling power is in the output.
We should create a separate repository with sub-directory for each language supported by Jupyter.
https://github.com/safwaahmad/Data-analysis covers more data in ipnyb file.
One could also reduce them by translating them into
.py
.https://github.com/TheAlgorithms/Python/search?l=jupyter-notebook