geoschem / geos-chem-cloud

Run GEOS-Chem easily on AWS cloud
http://cloud.geos-chem.org
MIT License
39 stars 9 forks source link

Add more python packages to miniconda env #12

Closed lizziel closed 6 years ago

lizziel commented 6 years ago

The existing geo env is not sufficient to work with the gcpy package (https://bitbucket.org/gcst/gcpy) and available packages for GEOS-Chem data regridding, both lat/lon and cubed sphere. However, I am able to successfully use gcpy and regrid using xESMF with the following environment installed (see below).

The package list is based on the one from gcpy but with additional packages added, particularly for handling cubed sphere data. Overall it includes more than is needed but provides very good coverage for what users might want. Storing the yml file within the AMI would allow users to be able to create their own environments very easily. I have a README available that I put together last year concisely giving directions for how to do this if you would like to adapt it.

Note that I cloned the gcpy, xESMF, and cubedsphere packages and then manually installed them on top of my environment, e.g. pip install -e /home/ubuntu/src/xESMF. I also had to specify an older version of xarray since gcpy is not compatible with the latest xarray version (!)

file: gcpy.yml

name: gcpy
channels:
    - defaults
    - conda-forge
    - nesii/label/dev-esmf
dependencies:
    - python=3.6    # Python version 3.6
    - basemap       # Precursor to cartopy
    - bottleneck    # C-optimized array functions for NumPy
    - cartopy       # Geographic plotting toolkit
    - cython        # Transpile Python->C
    - dask          # Parallel processing library
    - esmpy         # ESMF python package
    - graphviz      # visualize dask graph (binary)
    - future        # Python 2/3 compatibility
    - h5py          # Wrapper for HDF5
    - ipython       # IPython interpreter and tools
    - jupyter       # Jupyter federation architecture
    - matplotlib    # 2D plotting library
    - netcdf4       # Wrapper for netcdf4
    - notebook      # Notebook interface
    - numpy         # N-d array and numerics
    - pandas        # Labeled array library
    - pyresample    # Geographic resampling tools
    - scipy         # Common math/stats/science functions
    - scikit-learn  # Macine learning library
    - statsmodels   # Regression/modeling toolkit
    - seaborn       # Statistical visualizations
    - six           # Python 2/3 compatibility
    - tqdm          # Nice progressbar for longer computations
    - xarray=0.9.6  # N-d labeled array library
    - xbpch         # Interface for bpch output files
    - sphinx        # documentation
    - pip:
        - codecov   # coverage tool
        - xbpch     # Interface for bpch output files
        - h5pyd     # HDF5 for Amazon S3
        - h5netcdf  # allow HDF5-backend for xarray
        - graphviz  # visualize dask graph (python package)
        - pycodestyle # tool to check style conventions (formerly pep8)
        - pytest-cov  # coverage tool
JiaweiZhuang commented 6 years ago

The existing geo env is not sufficient to work with the gcpy package

What dependency is missing? gcpy doesn't seem to need a lot of dependencies...

since gcpy is not compatible with the latest xarray version

!! What exactly breaks?

I don't think gcpy and cubedsphere are mature tools right now so I prefer not having them pre-installed. Users can install them manually if needed...

I will update other packages. sphinx/pytest/codecov should not be necessary since they are typically used locally, and users will almost not touch them. Do you need to develop packages on EC2?

lizziel commented 6 years ago

Having esmpy would good setup for use of xESMF. I do not include gcpy, xESMF, and cubed sphere in the pre-installs list. But eventually gcpy will be ready and I intend to incorporate xESMF. At that point esmpy will be necessary. Might as well include it now.

gcpy/xarray incompatibility is a separate issue which I haven’t fully diagnosed. Some usage of xarray in either gcpy plot or benchmark silently depends on xarray conventions.py MaskedAndScaledArray class. That class was removed from the package on Jan 11, 2018. gcpy can be updated but we can’t do that for a while.

I will be making my GCHP benchmark python code and data, and nc vs bin python code, available publicly so it would be nice if users could play with it on the cloud. They would need to manually get the packages still in development but it would be good to avoid the xarray issue and have esmpy already there.

On Mar 16, 2018, at 12:05 PM, Jiawei Zhuang notifications@github.com wrote:

The existing geo env is not sufficient to work with the gcpy package

What dependency is missing? gcpy doesn't seem to need a lot of dependencies...

since gcpy is not compatible with the latest xarray version

!! What exactly breaks?

I don't think gcpy are cubedsphere are mature tools right now so I prefer not having them pre-installed. Users can install them manually if needed...

I will update other packages. sphinx/pytest/codecov should not be necessary since they are typically used locally, and users will almost not touch them. Do you need to develop packages on EC2?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

JiaweiZhuang commented 6 years ago

Some usage of xarray in either gcpy plot or benchmark silently depends on xarray conventions.py MaskedAndScaledArray class.

This should be an xbpch issue (darothen/xbpch#8). Since the benchmark plot module doesn't use xbpch, simply remove the import xbpch line in gcpy/core.py should resolve this issue. I'd like to keep xarray version up-to-date because there're quite a lot enhancements in v0.10.x

JiaweiZhuang commented 6 years ago

So I decide to roll back to xarray 0.9.6 until the xbpch issue is fixed by @darothen ...

Given that our current documentation on NC diagnostics is still under construction, some users might want to fall back to BPCH. xbpch should be a great workaround during this transition.

darothen commented 6 years ago

@JiaweiZhuang I'll have some time today and tomorrow to look into things. Your diagnosis in darothen/xbpch#8 is super-helpful and I think I'll be able to quickly test things and push out a patch/increment the version.

JiaweiZhuang commented 6 years ago

@darothen Thanks!! Hope it will work with both xarray 0.9.6 and 0.10.2

JiaweiZhuang commented 6 years ago

The updated tutorial AMI now uses xarray 0.9.6. Let's use darothen/xbpch#8 for the bpch issue.