fabianegli / singlecell_proteomics

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

No easy way to set up an execution environment #1

Closed fabianegli closed 1 year ago

fabianegli commented 1 year ago

It would be nice to have one - or multiple - easy ways to set up a Python environment that can run the code.

fabianegli commented 1 year ago

This is a documentation of the odyssey to reproduce the environment necessary to reproduce the Jupyter notebook in the original repository:

  1. Try with a venv on my local machine to make a first attempt to run the jupyter notebook. I am trying to work with Python 3.10 - not the newest, but better than the previous ones and I assume all dependencies will support it.
python3.10 -m venv venv
source venv/bin/activate
pip install -U pip

(Luckily we already have a .gitignore that is made for Python development so we can create the venv in the repository's root straight away.) 5. Get JupyterLab

pip install jupyterlab
  1. Revising decision to use Python 3.10 to use 3.8.2 as visible form the notebook. In the same block it reports scanpy version 1.8.2 These are all the reported Version I can find in the notebook. So lets use them.

NB: They should have done a pip freeze or conda export at least to record the used versions and indicated the OS. Other options like using the watermark package would also have helped. Of course even better would be a containerization approach of some sorts. But we'll get there later???

  1. I don't have Python 3.8.2 installed on my system, so for the sake of reproducibility let's use conda for the moment.

For this purpose lets create a env yaml file for conda

% conda create --name scprep python=3.8.2
% conda activate scprep
(scprep) % python -V
Python 3.8.2

reproduce-single-cell-proteomics.yaml:

name: scprep
channels:
  - conda-forge
  - defaults
dependencies:
  - anndata
  - jupyterlab
  - matplotlib
  - numpy
  - pandas
  - python=3.8.2
  - scanpy=1.8.2
  - scipy
  - seaborn
  - scikit-learn
conda env create --file reproduce-single-cell-proteomics.yaml
conda activate scprep

Then start JupyterLab % jupyter lab

Open the notebook.

Running the imports raises a SystemError.

SystemError: initialization of _internal failed without raising an exception

According to https://stackoverflow.com/a/74982146/6018688 it might be a numpy issue. The proposed solution is to downgrade numpy to 1.23.5 (currently installed is 1.24.1).

conda install numpy=1.23.5

Retry the notebook import cell.

Now the import works, so the question and the answer on StackOverflow get an upvote.

But there's this warning:

MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-