This repository contains my workspace for doing Data Science in Python.
If not already existing, create a conda environment:
conda create -n data_science python=3.7
Activate the environment:
source activate data_science
Setup the workspace:
pip install -U pip numpy
pip install -r requirements.txt
python -m ipykernel install --user
Setup jupyter notebooks
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user
jupyter nbextension install https://github.com/drillan/jupyter-black/archive/master.zip --user
jupyter nbextension enable jupyter-black-master/jupyter-black
Setup jupyter lab
jupyter labextension install jupyter-leaflet
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @krassowski/jupyterlab_go_to_definition
jupyter labextension install jupyterlab_bokeh
jupyter labextension install ipysheet
jupyter labextension install jupyterlab-drawio
jupyter labextension install @jupyterlab/toc
jupyter labextension install jupyterlab_vim
jupyter labextension install @jupyterlab/git
pip install jupyterlab-git
jupyter serverextension enable --py jupyterlab_git
jupyter labextension install @ryantam626/jupyterlab_code_formatter
pip install jupyterlab_code_formatter
jupyter serverextension enable --py jupyterlab_code_formatter
Reactivate the environment:
source deactivate data_science
source activate data_science
Load the submodules:
git submodule init
git submodule update
Activate the environment (if not already activated on this session):
source activate data_science
Set Spark environment variables:
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH
Start Jupyter Notebook:
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000
Get the last changes from upstream:
git pull
Activate the environment (if not already activated on this session):
source activate data_science
Update the dependencies:
pip install -r requirements.txt
Reactivate the environment:
source deactivate data_science
source activate data_science
Update submodules:
git submodule init
git submodule update
Activate the environment (if not already activated on this session):
source activate data_science
Upgrade the dependencies:
pip-compile --upgrade
pip install -r requirements.txt
Reactivate the environment:
source deactivate data_science
source activate data_science
Facets is a tool for the visual exploration of datasets. It can be installed as following:
jupyter nbextension install facets/facets-dist/ --user
Then jupyter notebook should be started with an additional command line option:
--NotebookApp.iopub_data_rate_limit=10000000
The visualization can then be loaded as explained in the demo notebook.
For computers on linux with optimus, you have to make a kernel that will be called with "optirun" to be able to use GPU acceleration. For this go to the following folder:
cd ~/.local/share/jupyter/kernels/
then edit the file python3/kernel.json
in order to add "optirun"
as first
entry into the argv
array:
{
"language": "python",
"display_name": "Python 3",
"argv": [
"optirun",
"/home/fabien/.conda/envs/data_science/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
]
}
I recommend installing the following notebook extension: