Samcoodess / reana-dms

Implementing REANA workflow for galaxy rotation-curve fitting analysis (RCFM) | Dark matter searches
MIT License
0 stars 0 forks source link

Issue with importing packages while running notebook in conda environment #5

Open Samcoodess opened 1 year ago

Samcoodess commented 1 year ago

Hello mentors, I am having an issue to run the target analysis RCFM locally. As specified in the GitHub repo Readme for setup. I tried the recommended version i.e. with Anaconda Python 3.6 version.

  1. First, I forked the target analysis repository creating my own forked version of the analysis. Forked RCFM @Samcoodess

  2. I cloned the forked repo in my Vscode terminal using git clone {SSH}

  3. Then, I navigated to my project directory and created an environment using conda create newenv

  4. I activated the environment using conda activate newenv

  5. Then, to install python=3.6 as mentioned in the GitHub readme, I ran conda install python=3.6. Upon running this, I had an ERROR -> I figured out that Python 3.6 isn't available in the default channels provided by Anaconda for macOS on ARM architecture and python=3.6 is pretty much dead.

  6. I checked my Python version python3 --version

    Python 3.11.4

  7. Then, I tried installing matplotlib and jupyter notebook using conda install -n newenv jupyter notebook and conda install -n newenv matplotlib. All the packages were installed.

  8. Used command jupyter notebook to navigate through my notebooks and ran model.ipynb

  9. Upon running the model.ipynb in the kernel of my environment, The first block of the code which is to import necessary modules throws an error ModuleNotFoundError

ModuleNotFoundError: No module named 'matplotlib'. It's not only for matplotlib but for all packages.

  1. I thought my kernel wasn't using the conda environment and to check I ran import sys print(sys.executable) But it's taking too long to run just these two lines.

  2. When I tried installing matplotlib in my notebook itself by using !pip install matplotlib, it said

    Requirement Satisfied.

Screenshot 2023-08-02 at 4 05 12 PM
matthewfeickert commented 1 year ago

@Samcoodess On a dev branch on your fork can you try making an environment with an environment.yml file that looks something like the following

name: rcfm-analysis
channels:
  - conda-forge
dependencies:
  - python=3.11
  - pandas=1.5  # Restricting this to sub pandas v2.0 as the pandas API changed here
  - matplotlib
  - notebook
  - jupyterlab

(you can expand this as needed later).

You'd create this environment initially with

conda create --file environment.yml

and then once the environment is active

conda activate rcfm-analysis

if you update the environment.yml file in the activated environment of the same name you can just do

conda env update --file environment.yml
# micromamba install --file environment.yml  # The command is different for mamba/micromamba

to install those new dependencies into your environment.

Try this and let us know how things go.

matthewfeickert commented 1 year ago

For another Fellow project I'm mentoring we're also discussing environment files on https://github.com/AndriiPovsten/Snakemake-backend-for-RECAST/issues/5, so feel free to cross-post and to talk to people like Andrii as well.

Samcoodess commented 1 year ago

Ok, @matthewfeickert Thank you. I will review his works and cross-posting seems like a great idea.

Samcoodess commented 1 year ago

@matthewfeickert

I created a new environment using the environment.yml file in my dev branch of the forked RCFM repository. Then, I added my created environment in .gitignore as we had discussed earlier. Now, should I also not commit the environment.yml file to the dev branch?

matthewfeickert commented 1 year ago

Now, should I also not commit the environment.yml file to the dev branch?

The environment.yml should be under version control. This is (one of) the thing(s) you want to share with everyone so that they can setup an environment that works with the code you have. So you can add this and push your dev branch to your fork (https://github.com/Samcoodess/RCFM).

Samcoodess commented 1 year ago

Thanks @matthewfeickert. Here is the link to the dev branch : RCFM dev branch

matthewfeickert commented 1 year ago

It has an environment.yml file, and the environment "rcfm-analysis" is added to the gitignore.

Cool. :+1: What you have in there now is the same file as I gave as an example in https://github.com/Samcoodess/reana-dms/issues/5#issuecomment-1662917991, which is fine, but I just gave that as an example and the environment doesn't really have anything to do with the analysis. What you should now do is figure our what are additional dependencies that are needed to be added to the environment.yml so that anyone who installs the described environment will be ready to do work.

A good start of what to look for is just by checking the output of running

git grep "import "

at the top level of the repository and seeing what modules end up getting imported. This will be a good start, but you'll need to refine things as some imported modules might be part of the Python standard library and so aren't something you can define as an external dependency, and some modules might be dependencies of other libraries used (e.g. numpy is a dependency of scipy and scipy tightly restricts what versions of numpy are allowed with each version (more on this if you're wanting a deep dive) so it doesn't make sense to add both scipy and numpy to your dependencies list).

A check that you have something close to the right environment specified is if you can run the analysis notebooks, deactivate and delete the environment, create it again from the environment.yml file, and then rerun the analysis notebooks.

Please ask any questions you might have along the way. Learning how to manage virtual environment dependencies is not easy right away. :)