CSHS-CWRA / RavenPy

A Python wrapper to setup and run the hydrologic modelling framework Raven
https://ravenpy.readthedocs.io
MIT License
25 stars 5 forks source link

Python3.10 builds are failing #357

Open Zeitsperre opened 5 months ago

Zeitsperre commented 5 months ago

I'm not certain why, but all builds of RavenPy running under Python3.10 seem to be failing for a handful of the same tests, those being:

This might be due to climpred or another library. Will see what can be done.

tlvu commented 5 months ago

Jenkins also fail against the latest Jupyter env is also timing out for me, wonder if it's related. Here is the conda env export change with a previous build that still work: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/commit/2f8c4508aec402874bb0bbbd8f79c724d91c1ac8

17:50:13  RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb Fx [ 54%]
17:50:13  xxxxxx                                                                   [ 57%]
18:23:36  RavenPy-master/docs/notebooks/12_Performing_hindcasting_experiments.ipynb F [ 57%]
18:23:36  xxxxxxx                                                                  [ 60%]
18:23:36  RavenPy-master/docs/notebooks/Assess_probabilistic_flood_risk.ipynb .... [ 62%]
18:57:13  Fxxx                                                                     [ 64%]
19:06:06  Cancelling nested steps due to timeout
tlvu commented 5 months ago

Here is the conda env export change with a previous build that still work: Ouranosinc/PAVICS-e2e-workflow-tests@2f8c450

Potential suspects:

- pydantic=2.6.4=pyhd8ed1ab_0
+ pydantic=2.7.0=pyhd8ed1ab_0

- rioxarray=0.15.3=pyhd8ed1ab_0
+ rioxarray=0.15.4=pyhd8ed1ab_0

- xesmf=0.8.4=pyhd8ed1ab_1
+ xesmf=0.8.5=pyhd8ed1ab_0
Zeitsperre commented 5 months ago

@tlvu Do you notice this only happening for Python 3.10?

tlvu commented 5 months ago

@tlvu Do you notice this only happening for Python 3.10?

I only noticed the failure simply because the upcoming Jupyter env will be python 3.10, instead of 3.9. I do not run Jenkins on multiples flavors of Python.

Note it starts failing only since my most recently build. All previous python 3.10 builds were working fine.

tlvu commented 5 months ago

Here is the conda env export change with a previous build that still work: Ouranosinc/PAVICS-e2e-workflow-tests@2f8c450

Potential suspects:

- pydantic=2.6.4=pyhd8ed1ab_0
+ pydantic=2.7.0=pyhd8ed1ab_0

- rioxarray=0.15.3=pyhd8ed1ab_0
+ rioxarray=0.15.4=pyhd8ed1ab_0

- xesmf=0.8.4=pyhd8ed1ab_1
+ xesmf=0.8.5=pyhd8ed1ab_0

It does not look like one of those package. I took the bad image py310-240419 and I downgrade each of those 3 separately. Ran Jenkins separately, and they all fail (hang).

But interractively in the JupyterLab env, everything just works. Very weird and not helpful. Don't have any specific error to search.

Zeitsperre commented 5 months ago

It's quite frustrating. I can't figure it out either. I've disabled one particularly flaky test in #358 and tried to prevent read access problems, but the issue remains. climpred might be what's culpable here.

tlvu commented 5 months ago

I am getting something here. Running only the hanging notebook in Jenkins, I got Kernel died while it tries to import the various modules at the beginning of the notebook !!! This is so weird.

15:59:09  =================================== FAILURES ===================================
15:59:09  _ RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb::Cell 0 _
15:59:09  Notebook cell execution failed
15:59:09  Cell 0: Timeout of 2000 seconds exceeded while executing cell. Failed to interrupt kernel in 5 seconds, so failing without traceback.
15:59:09  
15:59:09  Input:
15:59:09  import datetime as dt
15:59:09  
15:59:09  from matplotlib import pyplot as plt
15:59:09  
15:59:09  from ravenpy.config import commands as rc
15:59:09  from ravenpy.config.emulators import GR4JCN
15:59:09  from ravenpy.utilities import forecasting
15:59:09  from ravenpy.utilities.testdata import get_file
15:59:09  
15:59:09  =========================== short test summary info ============================
15:59:09  FAILED RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb::Cell 0
15:59:09  ================== 1 failed, 7 xfailed in 2006.10s (0:33:26) ===================
15:59:09  + EXIT_CODE=1
15:59:09  + tr [:upper:] [:lower:]
15:59:09  + echo true
15:59:09  + SAVE_RESULTING_NOTEBOOK=true
15:59:09  + [ xtrue = xtrue ]
15:59:09  + mkdir -p buildout
15:59:09  + basename RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb
15:59:09  + filename=11_Climatological_ESP_forecasting.ipynb
15:59:09  + echo 11_Climatological_ESP_forecasting.ipynb
15:59:09  + sed s/.ipynb$//
15:59:09  + filename=11_Climatological_ESP_forecasting
15:59:09  + [ -e buildout/11_Climatological_ESP_forecasting.output.ipynb ]
15:59:09  + jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=600 --allow-errors --output-dir buildout --output 11_Climatological_ESP_forecasting.output.ipynb RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb
15:59:09  [NbConvertApp] Converting notebook RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb to notebook
15:59:19  [NbConvertApp] ERROR | Kernel died while waiting for execute reply.
15:59:19  Traceback (most recent call last):
15:59:19    File "/opt/conda/envs/birdy/bin/jupyter-nbconvert", line 10, in <module>
15:59:19      sys.exit(main())
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/jupyter_core/application.py", line 283, in launch_instance
15:59:19      super().launch_instance(argv=argv, **kwargs)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/traitlets/config/application.py", line 1075, in launch_instance
15:59:19      app.start()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 420, in start
15:59:19      self.convert_notebooks()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 597, in convert_notebooks
15:59:19      self.convert_single_notebook(notebook_filename)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 563, in convert_single_notebook
15:59:19      output, resources = self.export_single_notebook(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 487, in export_single_notebook
15:59:19      output, resources = self.exporter.from_filename(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 201, in from_filename
15:59:19      return self.from_file(f, resources=resources, **kw)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 220, in from_file
15:59:19      return self.from_notebook_node(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/notebook.py", line 36, in from_notebook_node
15:59:19      nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 154, in from_notebook_node
15:59:19      nb_copy, resources = self._preprocess(nb_copy, resources)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 353, in _preprocess
15:59:19      nbc, resc = preprocessor(nbc, resc)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/base.py", line 48, in __call__
15:59:19      return self.preprocess(nb, resources)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 102, in preprocess
15:59:19      self.preprocess_cell(cell, resources, index)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 123, in preprocess_cell
15:59:19      cell = self.execute_cell(cell, index, store_history=True)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 165, in wrapped
15:59:19      return loop.run_until_complete(inner)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
15:59:19      return future.result()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbclient/client.py", line 1005, in async_execute_cell
15:59:19      raise DeadKernelError("Kernel died") from None
15:59:19  nbclient.exceptions.DeadKernelError: Kernel died
Zeitsperre commented 5 months ago

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

tlvu commented 5 months ago

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

For Jenkins, this is not random at all. With the latest build, Kernel died all the time, 100% reproducible !!! All previous builds Kernel do not die. Notebook code did not change so if it is illegal memory access, then something changed somewhere in the environment, not in the notebook code.

tlvu commented 5 months ago

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

For Jenkins, this is not random at all. With the latest build, Kernel died all the time, 100% reproducible !!! All previous builds Kernel do not die. Notebook code did not change so if it is illegal memory access, then something changed somewhere in the environment, not in the notebook code.

Moreover, it seems to die during the import at the beginning of the notebook. It would be pretty weird to have illegal memory access at import time !

Also why no illegal memory access if run interractively on JupyterLab?!

Everything is so weird !

tlvu commented 5 months ago

FYI, the Beta env on PAVICS has the broken env py310-240419 and the Gamma has the working env py310-240411.

tlvu commented 4 months ago

Just a note that for the Jupyter env, we moved to python 3.11 and the hanging error is gone !