Ouranosinc / PAVICS-e2e-workflow-tests

Test user-level workflow.
Apache License 2.0
0 stars 2 forks source link

docker: prevent manual pin of dependencies and improve build speed #95

Closed tlvu closed 2 years ago

tlvu commented 2 years ago

Overview

Previously, when xclim and ravenpy were pinning their own dependencies, the pins were ignored and we had to manually repeat the same pins again. See comment https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/pull/94#issuecomment-996841873.

This PR allows xclim and ravenpy to manage their own dependencies pinning transparently during this Jupyter env rebuild.

Also fixed a long standing build performance along the way. Build time went from 50 mins to 25 mins and builds on DockerHub works again (fixes https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/issues/51).

Deployed as "beta" image on https://pavics.ouranos.ca/jupyter for testing.

Changes

Related Issue / Discussion

Additional Information

< - ravenpy=0.7.5=pyhff6ddc9_0

  • ravenpy=0.7.8=pyh8a188c0_0

< - python=3.7.12=hb7a2778_100_cpython

  • python=3.8.12=hb7a2778_2_cpython

removed

< - vcs=8.2.1=pyh9f0ad1d_0

< - numpy=1.21.4=py37h31617e3_0

  • numpy=1.21.5=py38h87f13fb_0

< - xarray=0.20.1=pyhd8ed1ab_0

  • xarray=0.20.2=pyhd8ed1ab_0

< - rioxarray=0.8.0=pyhd8ed1ab_0

  • rioxarray=0.9.1=pyhd8ed1ab_0

< - cf_xarray=0.6.1=pyh6c4a22f_0

  • cf_xarray=0.6.3=pyhd8ed1ab_0

< - gdal=3.3.2=py37hd5a0ba4_2

  • gdal=3.3.3=py38hcf2042a_0

< - rasterio=1.2.6=py37hc20819c_2

  • rasterio=1.2.10=py38hfd64e68_0

< - climpred=2.1.6=pyhd8ed1ab_1

  • climpred=2.2.0=pyhd8ed1ab_0

< - clisops=0.7.0=pyh6c4a22f_0

  • clisops=0.8.0=pyh6c4a22f_0

< - xesmf=0.6.0=pyhd8ed1ab_0

  • xesmf=0.6.2=pyhd8ed1ab_0

< - birdy=v0.8.0=pyh6c4a22f_1

  • birdy=0.8.1=pyh6c4a22f_1

< - cartopy=0.20.0=py37hbe109c4_0

  • cartopy=0.20.1=py38hf9a4893_1

< - dask=2021.11.2=pyhd8ed1ab_0

  • dask=2022.1.0=pyhd8ed1ab_0

< - numba=0.53.1=py37hb11d6e1_1

  • numba=0.55.0=py38h4bf6c61_0

< - pandas=1.3.4=py37he8f5f7f_1

  • pandas=1.3.5=py38h43a58ef_0


- Full diff of `conda env export`: 

[211123-update211216-211221-conda-env-export.diff.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7758757/211123-update211216-211221-conda-env-export.diff.txt)

[211221-220116.1-conda-env-export.diff.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7909174/211221-220116.1-conda-env-export.diff.txt)

[211123-update211216-220116.1-conda-env-export.diff.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7909172/211123-update211216-220116.1-conda-env-export.diff.txt)

[220116.1-220121-conda-env-export.diff.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7922627/220116.1-220121-conda-env-export.diff.txt)

[211123-update211216-220121-conda-env-export.diff.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7922628/211123-update211216-220121-conda-env-export.diff.txt)

- Full new `conda env export`: 

[211221-conda-env-export.yml.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7758759/211221-conda-env-export.yml.txt)

[220116.1-conda-env-export.yml.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7909175/220116.1-conda-env-export.yml.txt)

[220121-conda-env-export.yml.txt](https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/files/7922629/220121-conda-env-export.yml.txt)
tlvu commented 2 years ago

Houston, I have notebooks failure that is most likely due to the shapely (1.7.1 --> 1.8.0) upgrade:

http://jenkins.ouranos.ca/job/PAVICS-e2e-workflow-tests/job/prevent-manual-pin-of-dependencies/10/console

19:22:46  _ PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb::Cell 2 _
19:22:46  Notebook cell execution failed
19:22:46  Cell 2: Cell execution caused an exception
19:22:46  
19:22:46  Input:
19:22:46  import geopandas as gpd
19:22:46  import hvplot.pandas
19:22:46  gdf = gpd.GeoDataFrame.from_file('/notebook_dir/pavics-homepage/tutorial_data/gaspesie_mrc.geojson')
19:22:46  gdf = gdf.dissolve(by='MUS_NM_MRC')
19:22:46  gdf['region_name'] = gdf.index
19:22:46  
19:22:46  # TODO replace with clisops average.average_shape() once it can do a 'skipna'
19:22:46  # mask of valid (non-nan) data cells
19:22:46  data_mask = ds_ens.tx_mean.isel(rcp=0, realization=0).mean(dim=['year','season']).notnull()
19:22:46  # spatial weights of gridcells interesecting each polygon
19:22:46  weight_masks = subset.create_weight_masks(ds_ens, poly=gdf)
19:22:46  def clean_masks(data_mask, masks):
19:22:46      #remove weight values of gridcells that are nan in the actual data. Rescale so total == 1 
19:22:46      return (masks * data_mask) / (masks * data_mask).sum(dim=['lat', 'lon'])
19:22:46  
19:22:46  weight_masks = clean_masks(data_mask, weight_masks)
19:22:46  
19:22:46  # Calculate weighted average for each region
19:22:46  with xr.set_options(keep_attrs=True):
19:22:46      reg_ts_sims = (ds_ens * weight_masks).sum(dim=['lat','lon'])
19:22:46      reg_ts = xens.ensemble_percentiles(reg_ts_sims)
19:22:46  reg_ts.load()
19:22:46  
19:22:46  # get only tx_mean percentile variables for this plot
19:22:46  vars1 = [v for v in reg_ts if 'tx_mean' in v]
19:22:46  # plot a simple map of the sub-regions
19:22:46  display(gdf.hvplot(geo=True, color='region_name',tiles='EsriImagery', legend=False, frame_width=400))
19:22:46  # Interative time-series plot of regional means
19:22:46  reg_ts[vars1].hvplot.line(x='year', title='time series of regional mean conditions')\
19:22:46              .opts(legend_position='top_left', frame_width=500)

(...)

19:22:46  /opt/conda/envs/birdy/lib/python3.7/site-packages/shapely/geometry/base.py in array_interface_base(self)
19:22:46      324             "removed in Shapely 2.0.",
19:22:46      325             ShapelyDeprecationWarning, stacklevel=2)
19:22:46  --> 326         return self._array_interface_base()
19:22:46      327 
19:22:46      328     @property
19:22:46  
19:22:46  TypeError: 'dict' object is not callable

Also extra warnings that also fail Jenkins from climex.ipynb (should I find a way to silence those warnings or can someone fix the notebook code to avoid those warnings?):

19:22:46  _________ pavics-sdi-master/docs/source/notebooks/climex.ipynb::Cell 7 _________
19:22:46  Notebook cell execution failed
19:22:46  Cell 7: Cell outputs differ
19:22:46  
19:22:46  Input:
19:22:46  fig = plt.figure(figsize=(8, 4))
19:22:46  
19:22:46  ax = plt.subplot(1, 1, 1, projection=rotp)
19:22:46  ax.coastlines()
19:22:46  ax.gridlines()
19:22:46  m = ax.pcolormesh(out.rlon, out.rlat, out.mean(dim="realization").isel(time=0))
19:22:46  plt.colorbar(m, orientation='horizontal', label=sdii.long_name, fraction=0.046, pad=0.04)
19:22:46  ax.set_title("Ensemble mean")
19:22:46  
19:22:46  Traceback:
19:22:46  Unexpected output fields from running code: {'stderr'}

/opt/conda/envs/birdy/lib/python3.7/site-packages/cartopy/crs.py:825: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  if len(multi_line_string) > 1:
/opt/conda/envs/birdy/lib/python3.7/site-packages/cartopy/crs.py:877: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  for line in multi_line_string:
/opt/conda/envs/birdy/lib/python3.7/site-packages/cartopy/crs.py:944: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  if len(p_mline) > 0:
/opt/conda/envs/birdy/lib/python3.7/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/10m_physical/ne_10m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
tlvu commented 2 years ago

New Jupyter env is deployed to https://medus.ouranos.ca/jupyter/ for testing/fixing those notebooks.

tlvu commented 2 years ago

@tlvu Is the jupyter-conda plugin still working on your side with the image pavics/workflow-tests:211221? I couldn't test it on https://medus.ouranos.ca/jupyter/ since I don't have access, but with a local birdhouse stack, I get an error with the plugin where it fails to retrieve available packages. It does find the list of installed packages but fails to find the packages in the "Not Installed" section found in the extension's tab (Settings -> Conda Packages Manager).

I tested with the preceding image tagged 211123 too, and I did not have a problem there.

@ChaamC this is very odd. I confirmed I reproduced your behavior with this new build but weirdly the version of mamba_gator is still the same 5.1.2 between the previous build and this new build so unless the switch to mamba installer did this, I am not sure why.

ChaamC commented 2 years ago

@ChaamC this is very odd. I confirmed I reproduced your behavior with this new build but weirdly the version of mamba_gator is still the same 5.1.2 between the previous build and this new build so unless the switch to mamba installer did this, I am not sure why.

@tlvu I am not sure either of the exact cause of this behaviour. I saw, by looking at my browser's developer tools, some info on the response from the request that seems to fail :

command: "/opt/conda/condabin/mamba repoquery search * --json"
conda_info: {GID: 1000, UID: 1000, active_prefix: "/opt/conda/envs/birdy", active_prefix_name: "birdy",…}
error: "RuntimeError('LockFile error. Aborting.')"
exception_name: "RuntimeError"
exception_type: "<class 'RuntimeError'>"
traceback: "Traceback (most recent call last):
 File \"/opt/conda/lib/python3.9/site-packages/conda/exceptions.py\", line 1080, in __call__
    return func(*args, **kwargs)
  File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 917, in exception_converter
    raise e
  File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 911, in exception_converter
    exit_code = _wrapped_main(*args, **kwargs)
  File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 869, in _wrapped_main
    result = do_call(args, p)
  File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 744, in do_call
    exit_code = repoquery(args, parser)
 File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 686, in repoquery
    pool = repoquery_api.create_pool(channels, platform, use_installed)
  File \"/opt/conda/lib/python3.9/site-packages/mamba/repoquery.py\", line 47, in create_pool
    load_channels(
  File \"/opt/conda/lib/python3.9/site-packages/mamba/utils.py\", line 122, in load_channels
    index = get_index(
  File \"/opt/conda/lib/python3.9/site-packages/mamba/utils.py\", line 103, in get_index
    is_downloaded = dlist.download(True)
RuntimeError: LockFile error. Aborting.
"

Not sure exactly how it happens. I wonder if it's a thing related to file permissions. I had some trouble with that when I added the conda extension to this repo. But by looking at the PR's code, permissions seems still to be handled properly...

tlvu commented 2 years ago

command: "/opt/conda/condabin/mamba repoquery search --json" conda_info: {GID: 1000, UID: 1000, active_prefix: "/opt/conda/envs/birdy", active_prefix_name: "birdy",…} error: "RuntimeError('LockFile error. Aborting.')" exception_name: "RuntimeError" exception_type: "<class 'RuntimeError'>" traceback: "Traceback (most recent call last): File \"/opt/conda/lib/python3.9/site-packages/conda/exceptions.py\", line 1080, in call return func(args, *kwargs) File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 917, in exception_converter raise e File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 911, in exception_converter exit_code = _wrapped_main(args, **kwargs) File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 869, in _wrapped_main result = do_call(args, p) File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 744, in do_call exit_code = repoquery(args, parser) File \"/opt/conda/lib/python3.9/site-packages/mamba/mamba.py\", line 686, in repoquery pool = repoquery_api.create_pool(channels, platform, use_installed) File \"/opt/conda/lib/python3.9/site-packages/mamba/repoquery.py\", line 47, in create_pool load_channels( File \"/opt/conda/lib/python3.9/site-packages/mamba/utils.py\", line 122, in load_channels index = get_index( File \"/opt/conda/lib/python3.9/site-packages/mamba/utils.py\", line 103, in get_index is_downloaded = dlist.download(True) RuntimeError: LockFile error. Aborting. "

Oh wow, how did you managed to get this output? This is from this PAVICS Jupyter image?

ChaamC commented 2 years ago

Oh wow, how did you managed to get this output? This is from this PAVICS Jupyter image?

I used a local birdhouse-deploy stack, and started the image pavics/workflow-tests:211221 with jupyterhub. Then when loading the conda extension tab, which lists the different available package, I was able to check the requests happening in Chrome Developer Tools.

image

I checked the different responses by checking the info of each requests. The birdy requests are those that get the packages installed in my environment, which gave succesful response with a list of 500+ packages.

The packages request gave the output I gave earlier with the error. When I tried the image tagged 211123, which has a working conda extension, I could see the 9000+ available packages returned by that request.

tlvu commented 2 years ago

@huard @tlogan2000 I downgraded shapely from 1.8.0 back to 1.7.1 and both the climex.ipynb and the homepage notebook 5 Jenkins error mentioned in comment https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/pull/95#issuecomment-999216646 are gone.

Can you help fix those 2 notebooks? You can use the beta env on PAVICS to fix those notebooks.

tlvu commented 2 years ago

FYI @ChaamC I am at the end of road with this jupyter-conda plugin problem, I've opened an issue on their side to get more help https://github.com/mamba-org/gator/issues/170

tlvu commented 2 years ago

@tlogan2000

I pinned shapely to the old 1.7.1 version and there is a new failure with the homepage nb 5: http://jenkins.ouranos.ca/job/PAVICS-e2e-workflow-tests/job/prevent-manual-pin-of-dependencies/14/consoleFull

12:36:36  File /opt/conda/envs/birdy/lib/python3.9/site-packages/holoviews/core/data/xarray.py:224, in XArrayInterface.init(cls, eltype, data, kdims, vdims)
12:36:36      220     undeclared = [
12:36:36      221         c for c in da.coords if c not in kdims and len(da[c].shape) == 1 and
12:36:36      222         da[c].shape[0] > 1]
12:36:36      223     if undeclared:
12:36:36  --> 224         raise DataError(
12:36:36      225             'The coordinates on the %r DataArray do not match the '
12:36:36      226             'provided key dimensions (kdims). The following coords '
12:36:36      227             'were left unspecified: %r. If you are requesting a '
12:36:36      228             'lower dimensional view such as a histogram cast '
12:36:36      229             'the xarray to a columnar format using the .to_dataframe '
12:36:36      230             'or .to_dask_dataframe methods before providing it to '
12:36:36      231             'HoloViews.' % (vdim.name, undeclared))
12:36:36      232 return data, {'kdims': kdims, 'vdims': vdims}, {}
12:36:36  
12:36:36  DataError: The coordinates on the 'tx_mean' DataArray do not match the provided key dimensions (kdims). The following coords were left unspecified: ['horizon']. If you are requesting a lower dimensional view such as a histogram cast the xarray to a columnar format using the .to_dataframe or .to_dask_dataframe methods before providing it to HoloViews.

A good news is the bokeh/holoviews performance problem seems to be fixed in the new build.

This new build is deployed as "beta" image in prod.

tlogan2000 commented 2 years ago

I'll take a look.

tlvu commented 2 years ago

@ChaamC FYI the jupyter-conda plugin is fixed by uninstalling mamba from the image !