ESIPFed / esiphub-dev

Development JupyterHub on AWS targeting pangeo environment for National Water Model exploration
MIT License
2 stars 1 forks source link

Python environment for CDI workshop #20

Closed rsignell-usgs closed 6 years ago

rsignell-usgs commented 6 years ago

@dbuscombe-usgs, @csherwood-usgs alerted me to the instructions for installing the python environment for the CDI class and I'm quite worried as combining defaults, conda and pip in the way you are advocating is very problematic, as Filipe (@ocefpaf) can attest to.

I would recommend instead that you follow Filipe's python instructions for IOOS, but instead of using the IOOS environment.yml file use your own environment file:

conda env create -f tf_environment.yml  

where tf_environment.yml file is:

name: tf
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.6
  - pydensecrf
  - cython
  - numpy
  - scipy
  - matplotlib
  - s3fs
  - scikit-image
  - scikit-learn
  - joblib
  - tensorflow
  - opencv
  - ipython
  - tensorflow
  - tensorflow-hub
  - tqdm

I've created this environment locally. Is there a notebook I can test on? (or you can just try it. It took 5 minutes to create)

dbuscombe-usgs commented 6 years ago

Thanks. I can see that would be cleaner. Have you tried that? My recipe has worked for me on several machines.

dbuscombe-usgs commented 6 years ago

At this point I've had several people email me telling me that my recipe worked. So I'm inclined to leave it alone, unless it causes folks issues. (Also, I have bigger issues to deal with making all my scripts work within the jupyter environment, so its bumped down my priority list)

rsignell-usgs commented 6 years ago

@dbuscombe-usgs, @csherwood-usgs told me he had some issues, which is why I took a look.

So shall we give up on pangeo.esipfed.org for this workshop? That would actually allow me to rest a little easier, as I'd like to focus on other things as well....

dbuscombe-usgs commented 6 years ago

I will post your instructions on the website as a 'plan b' alternative.

I appreciate all your help, and apologies it has sucked up your time. It's been a steep learning curve for me, but I'm committed to it now. I have most of the kinks worked out at this point. I would like to persevere - it will be much better

rsignell-usgs commented 6 years ago

It's pretty dangerous to mix stuff from defaults, conda-forge and pip. It may work today, but tomorrow....

dbuscombe-usgs commented 6 years ago

I agree, but conda-forge caused me issues with tensorflow when I last tried. Maybe that's changed now

rsignell-usgs commented 6 years ago

pangeo.esipfed.org is using tensorflow from conda-forge, so if it's working there, it should be fine.

dbuscombe-usgs commented 6 years ago

Ok good. I will modify my instructions on the site

rsignell-usgs commented 6 years ago

but it would make me feel better if you have a notebook for me to try.... We could add s3fs to access the data from s3.

dbuscombe-usgs commented 6 years ago

I'll have complete notebooks for you to try in a day or two

csherwood-usgs commented 6 years ago

I just installed the environment listed above (tf_environment.yml) on my Win 10 desktop with no apparent problems.

dbuscombe-usgs commented 6 years ago

Cool thanks. I just updated the website

ocefpaf commented 6 years ago

I agree, but conda-forge caused me issues with tensorflow when I last tried. Maybe that's changed now

I doubt it was conda-forge b/c our tensorflow is a re-package from the wheel version present at PyPI. With so many install steps things are error prone and hard to reproduce. I strongly recommend a single env file and the use of conda-env. Note that you can still list package from PyPI in the env file, here is an example that I used recently:

name: GIS
channels:
  - conda-forge
dependencies:
  - python=2.7
  - pypy2.7
  - fiona
  - imagemagick
  - pydensecrf
  - qgis
  - tensorflow
  - conda-forge/label/dev::rasterio
  - pip:
    - protobuf-gis
dbuscombe-usgs commented 6 years ago

Many thanks all. I am still new to conda - this is helpful

ocefpaf commented 6 years ago

@dbuscombe-usgs I'm not sure you are interested but if you have a set of notebooks (or tests) we can spin up CIs, Travis-CI (for Linux and OS X) and AppVeyor (for Windows), that will create that environment and runt the notebooks to ensure that everything is working. That will help specially if people are adding/removing packages until the date of the workshop.

rsignell-usgs commented 6 years ago

@ocefpaf, @dbuscombe-usgs pointed me toward his workshop materials here (still in progress, of course): https://github.com/dbuscombe-usgs/cdi_dl_workshop

rsignell-usgs commented 6 years ago

@dbuscombe-usgs , I tried running 2.feature_extraction_scikitlearn.ipynb and it needed scikit-learn, so I added that to the environment in the notebook docker container and relaunched jupyterhub at pangeo.esipfed.org. I'm trying to make the base environment work for the workshop so folks don't have to do any environment customization or creation.

dbuscombe-usgs commented 6 years ago

Thanks. Another needed package is tqdm

rsignell-usgs commented 6 years ago

@dbuscombe-usgs, okay. Any other packages you know of before I rebuild the container?

dbuscombe-usgs commented 6 years ago

s3fs. That's it

rsignell-usgs commented 6 years ago

@dbuscombe-usgs , okay, done. If you go to control panel, shutdown your server and then start again, it will take quite a while (5 min?) since the container has changed (and it's big 5GB!). But you should have tqdm and skikit-learn now. s3fs was already there (we are talking here about the Dockerfile for the notebook container, not the tf_environment.yml file....)

csherwood-usgs commented 6 years ago

I tried to load the environment and run the workshop notebooks in it. It does not have jupyter! What other stuff is missing?

rsignell-usgs commented 6 years ago

There are two ways to handle this. One way is to have users install Jupiter into their root environment, and have them add and nb_conda_kernels to the custom environment for the workshop. But probably easier just to have them add Jupiter to the environment for the workshop, and then have them launch Jupiter notebook from that environment.

csherwood-usgs commented 6 years ago

I agree...I think the .yml file should provide the whole environment.

BTW, when I use deactivate in my Win10 version of miniconda, my entire path gets eliminated, so not even the conda command works. I have to open a new Anaconda terminal window.

dbuscombe-usgs commented 6 years ago

You're both assuming that I want users to run jupyter notebooks from their own terminals. Who does that? I certainly never do. Not for real analysis. This is just a teaching/demo tool, in my opinion. But I'll do whatever you suggest

dbuscombe-usgs commented 6 years ago

I guess I don't really follow what the goal is here. My focus has been creating two separate things: 1) code that lives and runs on a local machine (that you haven't even seen yet) and 2) a set of teaching materials that runs on the cloud. What you're suggesting is a third thing, where you can run those materials on your own machine?

csherwood-usgs commented 6 years ago

OK. That makes sense. I thought the backup plan was to have people be able to run the teaching notebooks on their laptops, which is what I was trying to do. But if the Jupyter hub works, I guess there is no need for that. However, I develop almost all of my code in notebooks, and only move it to .py files when I use it for production.

dbuscombe-usgs commented 6 years ago

Apologies for the confusion. That is an ideal situation that I hadn't considered and haven't tested. We'll make it work, but it'll be the last thing. Still have lots of material to finalize and code to test. I started this way too late!

dbuscombe-usgs commented 6 years ago

I updated the website instructions to include the nb_conda module in the .yml file

rsignell-usgs commented 6 years ago

Uh, does that work? I'm not familiar with that.

We usually add jupyter if we want to launch jupyter from the custom environment.

Alternatively, if we launch jupyter from the root environment, we add nb_conda_kernels to the custom environment to make sure it appears in the list of available kernels.

dbuscombe-usgs commented 6 years ago

Well, I don't know. I haven't really used jupyter before so there isn't a 'usually' for me. I don't even know the distinction you're making. But if I type jupyter notebook from within my conda env on my PC, then I get taken to http://localhost:8888/notebooks and I can navigate to my ipynb file and run it.

dbuscombe-usgs commented 6 years ago

Is that a dumb thing to do?

ocefpaf commented 6 years ago

@rsignell-usgs and @dbuscombe-usgs the nb_conda depends on jupyter so it will be installed. But my guess is that you are not really using nb_conda, if so you should consider an alternative b/c that project seems to be dead.

Is that a dumb thing to do?

If you need jupyter I guess that it is better to be explicit and add it instead of something that installs it.

dbuscombe-usgs commented 6 years ago

Ok, thanks for the info. I'll replace nb_conda with jupyter

Also, do I want users to fork or clone? I would say clone because they won't be contributing to the repo, but is there any advantage to them forking?

ocefpaf commented 6 years ago

For tutorials like this I recommend say clone in the instructions. (If a user forks s/he knows what they are doing you don't have to worry about it.)

csherwood-usgs commented 6 years ago

Small typos; The instructions here:

https://sites.google.com/view/usgsdeeplearning/home/installing-and-configuring-software

say Python 3.6, but the environment installs 3.5. Maybe not a big deal, but confusing.

csherwood-usgs commented 6 years ago

I also prefer the classic view. Also, I am finding that I need to manage the kernals, and shut down ones I am no longer using. That seems to speed up the ones I am using.

rsignell-usgs commented 6 years ago

Replacing “lab” with “tree” in the url gives you classic instead of lab.

Just out of curiosity, why are we specifying 3.5 instead of 3.6?

dbuscombe-usgs commented 6 years ago

Yes, confusing. We want to install Anaconda with 3.6, but we're using 3.5. With conda environments, we can use whatever python we like. Tensorflow on windows only very recently supports 3.6. The official install instructions still recommend 3.5

rsignell-usgs commented 6 years ago

I think this one is all set. We can reopen if needed for the 2nd Deep Neural Networks Workshop, Sep 25-27, 2018.