MAAP-Project / Community

Issue for MAAP (Zenhub)
2 stars 1 forks source link

Add missing packages to Pangeo workspace #729

Closed anilnatha closed 9 months ago

anilnatha commented 1 year ago

While discussing missing packages with @wildintellect , he shared that geopandas is missing and that we should start with adding this to our Pangeo workspace where we recently adding other packages that were missing.

If all works well, we will later add it, and the other missing packages that were added to Pangeo, to our other workspaces.

NOTE (2023-12-4): Updated link to package reconciliation list.

anilnatha commented 1 year ago

@wildintellect If there are other packages we are missing, please document them here and we'll get them integrated as part of this ticket.

pahbs commented 1 year ago

Below is a list of other python packages that are part of the boreal biomass geoprocessing workflow that we've prototyped on MAAP. I would think that these should be part of the Pangeo env by default, since they could get use across many MAAP projects.

These are derived from: https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/build_command_main.sh which installs this: https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/env_main.yaml

which provides the packages that run our DPS jobs and our notebooks up until this point.

Note: it is not clear to me whether the specific package versions listed in that yaml are required. For example, that latest version of Geopandas should probably be deployed on Pangeo (unless @wildintellect indicates otherwise)

pahbs commented 1 year ago

I think these packages will require careful install and testing due to sensitive dependencies: rio_tiler rio_cogeo cogeo_mosaic

For example, when i ran this: !pip install -U rio_tiler

I saw this: Installing collected packages: rio_tiler Attempting uninstall: rio_tiler Found existing installation: rio-tiler 4.1.12 Uninstalling rio-tiler-4.1.12: Successfully uninstalled rio-tiler-4.1.12 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cogeo-mosaic 5.1.1 requires morecantile<4.0,>=3.1, but you have morecantile 4.2.0 which is incompatible. cogeo-mosaic 5.1.1 requires rio-tiler<5.0,>=4.0.0a0, but you have rio-tiler 5.0.0 which is incompatible. Successfully installed rio_tiler-5.0.0

This issue, for use, is the highest priority - because we cannot assess our DPS results that we need to complete in July 2023 without it.

One test (from my workspace) could look like this: !python /projects/code/icesat2_boreal/lib/build_tindex_master.py -t Topo -y 2023 -m '06' --user 'montesano' --maap_version master -alg_name 'do_topo_stack_3-1-5'

wildintellect commented 1 year ago

@pahbs let me look into the rio* related packages since most of those are not in conda-forge we might just need to change versions. cc: @vincentsarago

wildintellect commented 1 year ago

@anilnatha can you dump a list of packages to compare against https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml @pahbs can you please look at this ^^^ to see what key packages should be included from this list.

anilnatha commented 1 year ago

@wildintellect Can you clarify the source of what I should dump from? Is it the pangeo workspace so that we can glean insights of all the dependencies that are installed using its respective environment.yml?

wildintellect commented 1 year ago

@anilnatha if you dump the MAAP Pangeo package list you can diff it against the Pangeo official environment.yml to see what is not currently included.

wildintellect commented 1 year ago

For comparison, the VEDA lockfile of their Pangeo instances https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/conda-linux-64.lock

grallewellyn commented 9 months ago

@wildintellect Our current pangeo package list is here. These are the packages in the pangeo official environment.yml that are not in the MAAP pangeo environment.yml

adlfs
argopy
black
ciso
cmocean
cdsapi
cf_xarray
dask-ml
fastjmd95
fsspec
gcsfs
gh
gh-scoped-creds
git-lfs
gsw
line_profiler
memory_profiler
metpy
nb_conda_kernels
nbstripout
numbagg
numcodecs
python-graphviz
xarray-datatree
xarray_leaflet
xarray-spatial
xbatcher
xcape
xclim
xgboost
xgcm
xhistogram
xmip
xmitgcm
xpublish
xrft
xskillscore

@pahbs Are there any key packages from this list that you think we should add to the MAAP pangeo environment.yml in our next release?

wildintellect commented 9 months ago
cf_xarray
fsspec
xarray-datatree
xarray_leaflet
xarray-spatial

@grallewellyn is there a reason we can't add all of them? we are trying to reach parity with Pangeo-notebook images so they are interchangeable (VEDA uses Pangeo-notebook)

grallewellyn commented 9 months ago

Yes, I made a ticket to add all packages from the diff list I sent above https://github.com/MAAP-Project/Community/issues/898