Closed anilnatha closed 9 months ago
@wildintellect If there are other packages we are missing, please document them here and we'll get them integrated as part of this ticket.
Below is a list of other python packages that are part of the boreal biomass geoprocessing workflow that we've prototyped on MAAP. I would think that these should be part of the Pangeo env by default, since they could get use across many MAAP projects.
These are derived from: https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/build_command_main.sh which installs this: https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/env_main.yaml
which provides the packages that run our DPS jobs and our notebooks up until this point.
Note: it is not clear to me whether the specific package versions listed in that yaml are required. For example, that latest version of Geopandas should probably be deployed on Pangeo (unless @wildintellect indicates otherwise)
I think these packages will require careful install and testing due to sensitive dependencies: rio_tiler rio_cogeo cogeo_mosaic
For example, when i ran this:
!pip install -U rio_tiler
I saw this:
Installing collected packages: rio_tiler Attempting uninstall: rio_tiler Found existing installation: rio-tiler 4.1.12 Uninstalling rio-tiler-4.1.12: Successfully uninstalled rio-tiler-4.1.12 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cogeo-mosaic 5.1.1 requires morecantile<4.0,>=3.1, but you have morecantile 4.2.0 which is incompatible. cogeo-mosaic 5.1.1 requires rio-tiler<5.0,>=4.0.0a0, but you have rio-tiler 5.0.0 which is incompatible. Successfully installed rio_tiler-5.0.0
This issue, for use, is the highest priority - because we cannot assess our DPS results that we need to complete in July 2023 without it.
One test (from my workspace) could look like this:
!python /projects/code/icesat2_boreal/lib/build_tindex_master.py -t Topo -y 2023 -m '06' --user 'montesano' --maap_version master -alg_name 'do_topo_stack_3-1-5'
@pahbs let me look into the rio*
related packages since most of those are not in conda-forge
we might just need to change versions. cc: @vincentsarago
@anilnatha can you dump a list of packages to compare against https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml @pahbs can you please look at this ^^^ to see what key packages should be included from this list.
@wildintellect Can you clarify the source of what I should dump from? Is it the pangeo workspace so that we can glean insights of all the dependencies that are installed using its respective environment.yml?
@anilnatha if you dump the MAAP Pangeo package list you can diff it against the Pangeo official environment.yml to see what is not currently included.
For comparison, the VEDA lockfile of their Pangeo instances https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/conda-linux-64.lock
@wildintellect Our current pangeo package list is here. These are the packages in the pangeo official environment.yml that are not in the MAAP pangeo environment.yml
adlfs
argopy
black
ciso
cmocean
cdsapi
cf_xarray
dask-ml
fastjmd95
fsspec
gcsfs
gh
gh-scoped-creds
git-lfs
gsw
line_profiler
memory_profiler
metpy
nb_conda_kernels
nbstripout
numbagg
numcodecs
python-graphviz
xarray-datatree
xarray_leaflet
xarray-spatial
xbatcher
xcape
xclim
xgboost
xgcm
xhistogram
xmip
xmitgcm
xpublish
xrft
xskillscore
@pahbs Are there any key packages from this list that you think we should add to the MAAP pangeo environment.yml in our next release?
cf_xarray
fsspec
xarray-datatree
xarray_leaflet
xarray-spatial
@grallewellyn is there a reason we can't add all of them? we are trying to reach parity with Pangeo-notebook images so they are interchangeable (VEDA uses Pangeo-notebook)
Yes, I made a ticket to add all packages from the diff list I sent above https://github.com/MAAP-Project/Community/issues/898
While discussing missing packages with @wildintellect , he shared that
geopandas
is missing and that we should start with adding this to our Pangeo workspace where we recently adding other packages that were missing.If all works well, we will later add it, and the other missing packages that were added to Pangeo, to our other workspaces.
NOTE (2023-12-4): Updated link to package reconciliation list.