StreakyCobra / data-science-workspace

Data Science Workspace
2 stars 0 forks source link

Python Data Science Workspace

This repository contains my workspace for doing Data Science in Python.

Requirements

Installation and setup

How to set-up the workspace the first time

  1. If not already existing, create a conda environment:

    conda create -n data_science python=3.7
  2. Activate the environment:

    source activate data_science
  3. Setup the workspace:

    pip install -U pip numpy
    pip install -r requirements.txt
    python -m ipykernel install --user
  4. Setup jupyter notebooks

    jupyter contrib nbextension install --user
    jupyter nbextensions_configurator enable --user
    jupyter nbextension install https://github.com/drillan/jupyter-black/archive/master.zip --user
    jupyter nbextension enable jupyter-black-master/jupyter-black
  5. Setup jupyter lab

    jupyter labextension install jupyter-leaflet
    jupyter labextension install @jupyter-widgets/jupyterlab-manager
    jupyter labextension install @krassowski/jupyterlab_go_to_definition
    jupyter labextension install jupyterlab_bokeh
    jupyter labextension install ipysheet
    jupyter labextension install jupyterlab-drawio
    jupyter labextension install @jupyterlab/toc
    jupyter labextension install jupyterlab_vim
    
    jupyter labextension install @jupyterlab/git
    pip install jupyterlab-git
    jupyter serverextension enable --py jupyterlab_git
    
    jupyter labextension install @ryantam626/jupyterlab_code_formatter
    pip install jupyterlab_code_formatter
    jupyter serverextension enable --py jupyterlab_code_formatter
  6. Reactivate the environment:

    source deactivate data_science
    source activate data_science
  7. Load the submodules:

    git submodule init
    git submodule update

How to use the workspace

  1. Activate the environment (if not already activated on this session):

    source activate data_science
  2. Set Spark environment variables:

    export SPARK_HOME=/opt/spark
    export PATH=$SPARK_HOME/bin:$PATH
  3. Start Jupyter Notebook:

    jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

How to update the workspace (after an upstream update)

  1. Get the last changes from upstream:

    git pull
  2. Activate the environment (if not already activated on this session):

    source activate data_science
  3. Update the dependencies:

    pip install -r requirements.txt
  4. Reactivate the environment:

    source deactivate data_science
    source activate data_science
  5. Update submodules:

    git submodule init
    git submodule update

How to upgrade the workspace (upgrading python packages)

  1. Activate the environment (if not already activated on this session):

    source activate data_science
  2. Upgrade the dependencies:

    pip-compile --upgrade
    pip install -r requirements.txt
  3. Reactivate the environment:

    source deactivate data_science
    source activate data_science

Facets

Facets is a tool for the visual exploration of datasets. It can be installed as following:

jupyter nbextension install facets/facets-dist/ --user

Then jupyter notebook should be started with an additional command line option:

--NotebookApp.iopub_data_rate_limit=10000000

The visualization can then be loaded as explained in the demo notebook.

Troubleshooting

GPU support for Jupyter

For computers on linux with optimus, you have to make a kernel that will be called with "optirun" to be able to use GPU acceleration. For this go to the following folder:

    cd ~/.local/share/jupyter/kernels/

then edit the file python3/kernel.json in order to add "optirun" as first entry into the argv array:

    {
            "language": "python",
            "display_name": "Python 3",
            "argv": [
            "optirun",
            "/home/fabien/.conda/envs/data_science/bin/python",
            "-m",
            "ipykernel",
            "-f",
            "{connection_file}"
            ]
    }

Interesting notebook extensions

I recommend installing the following notebook extension: