CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

issue with running the notebook related to impact functions #899

Closed sunt05 closed 1 week ago

sunt05 commented 1 week ago

Hi, I'm trying to run the notebook associated with #692 but can't move forward.

I followed this guide for the installation of CLIMADA. Below is the version info:

CLIMADA Version: 4.1.0
python: 3.9.18

The issue was occurring while importing climada:

import logging
import climada

logging.getLogger("climada").setLevel("WARNING")
Error message below (Click to expand)

The issue seems to be related to an upstream package.

``` { "name": "AttributeError", "message": "module 'pandas.core.strings' has no attribute 'StringMethods'", "stack": "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 2 1 import logging ----> 2 import climada 4 logging.getLogger(\"climada\").setLevel(\"WARNING\") File ~/micromamba/envs/climada/lib/python3.9/site-packages/climada/__init__.py:24 21 from shutil import copyfile 22 from pathlib import Path ---> 24 from .util.config import CONFIG 25 from .util.constants import * 28 GSDP_DIR = SYSTEM_DIR.joinpath('GSDP') File ~/micromamba/envs/climada/lib/python3.9/site-packages/climada/util/__init__.py:26 24 from .config import * 25 from .constants import * ---> 26 from .coordinates import * 27 from .save import * 29 ureg = UnitRegistry() File ~/micromamba/envs/climada/lib/python3.9/site-packages/climada/util/coordinates.py:33 30 import zipfile 32 from cartopy.io import shapereader ---> 33 import dask.dataframe as dd 34 import geopandas as gpd 35 import numba File ~/.local/lib/python3.9/site-packages/dask/dataframe/__init__.py:4 2 import dask.dataframe._pyarrow_compat 3 from dask.base import compute ----> 4 from dask.dataframe import backends, dispatch, rolling 5 from dask.dataframe.core import ( 6 DataFrame, 7 Index, (...) 13 to_timedelta, 14 ) 15 from dask.dataframe.groupby import Aggregation File ~/.local/lib/python3.9/site-packages/dask/dataframe/backends.py:21 19 from dask.array.percentile import _percentile 20 from dask.backends import CreationDispatch, DaskBackendEntrypoint ---> 21 from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame 22 from dask.dataframe.dispatch import ( 23 categorical_dtype_dispatch, 24 concat, (...) 36 union_categoricals_dispatch, 37 ) 38 from dask.dataframe.extensions import make_array_nonempty, make_scalar File ~/.local/lib/python3.9/site-packages/dask/dataframe/core.py:35 33 from dask.blockwise import Blockwise, BlockwiseDep, BlockwiseDepDict, blockwise 34 from dask.context import globalmethod ---> 35 from dask.dataframe import methods 36 from dask.dataframe._compat import ( 37 PANDAS_GT_140, 38 PANDAS_GT_150, 39 check_numeric_only_deprecation, 40 ) 41 from dask.dataframe.accessor import CachedAccessor, DatetimeAccessor, StringAccessor File ~/.local/lib/python3.9/site-packages/dask/dataframe/methods.py:22 10 # preserve compatibility while moving dispatch objects 11 from dask.dataframe.dispatch import ( # noqa: F401 12 concat, 13 concat_dispatch, (...) 20 union_categoricals, 21 ) ---> 22 from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like 24 # cuDF may try to import old dispatch functions 25 hash_df = hash_object_dispatch File ~/.local/lib/python3.9/site-packages/dask/dataframe/utils.py:19 17 from dask.base import get_scheduler, is_dask_collection 18 from dask.core import get_deps ---> 19 from dask.dataframe import ( # noqa: F401 register pandas extension types 20 _dtypes, 21 methods, 22 ) 23 from dask.dataframe._compat import PANDAS_GT_110, PANDAS_GT_120, tm # noqa: F401 24 from dask.dataframe.dispatch import ( # noqa : F401 25 make_meta, 26 make_meta_obj, 27 meta_nonempty, 28 ) File ~/.local/lib/python3.9/site-packages/dask/dataframe/_dtypes.py:3 1 import pandas as pd ----> 3 from dask.dataframe.extensions import make_array_nonempty, make_scalar 6 @make_array_nonempty.register(pd.DatetimeTZDtype) 7 def _(dtype): 8 return pd.array([pd.Timestamp(1), pd.NaT], dtype=dtype) File ~/.local/lib/python3.9/site-packages/dask/dataframe/extensions.py:6 1 \"\"\" 2 Support for pandas ExtensionArray in dask.dataframe. 3 4 See :ref:`extensionarrays` for more. 5 \"\"\" ----> 6 from dask.dataframe.accessor import ( 7 register_dataframe_accessor, 8 register_index_accessor, 9 register_series_accessor, 10 ) 11 from dask.utils import Dispatch 13 make_array_nonempty = Dispatch(\"make_array_nonempty\") File ~/.local/lib/python3.9/site-packages/dask/dataframe/accessor.py:190 129 _accessor_methods = ( 130 \"asfreq\", 131 \"ceil\", (...) 145 \"tz_localize\", 146 ) 148 _accessor_properties = ( 149 \"components\", 150 \"date\", (...) 186 \"year\", 187 ) --> 190 class StringAccessor(Accessor): 191 \"\"\"Accessor object for string properties of the Series values. 192 193 Examples (...) 196 >>> s.str.lower() # doctest: +SKIP 197 \"\"\" 199 _accessor_name = \"str\" File ~/.local/lib/python3.9/site-packages/dask/dataframe/accessor.py:276, in StringAccessor() 272 meta = (self._series.name, object) 273 return self._function_map(method, pat=pat, n=n, expand=expand, meta=meta) 275 @derived_from( --> 276 pd.core.strings.StringMethods, 277 inconsistencies=\"``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``\", 278 ) 279 def split(self, pat=None, n=-1, expand=False): 280 \"\"\"Known inconsistencies: ``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``.\"\"\" 281 return self._split(\"split\", pat=pat, n=n, expand=expand) AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'" } ```
emanuel-schmid commented 1 week ago

Thanks for reporting this @sunt05. In order to run this notebook you will eventually have to switch to the "advanced installation" method, as you will need to install climada from sources (from the calibrate-impact-function branch, to be more precise).

However failing at the import climada step cannot be attributed to the "simple installation" approach! Usually this happens because the ipykernel is wrong. Have you checked that?

peanutfun commented 1 week ago

@sunt05 This looks like a mismatch between the versions of dask and pandas. As @emanuel-schmid already mentioned, you need to follow the Advanced Installation Instructions for running the script from #692. In step 4, replace

git checkout develop

with

git checkout calibrate-impact-functions

and continue with the instructions. Following them will not install Jupyter into the environment, so you can then choose to install Jupyterlab or install the more lightweight Jupyter with

mamba install -n climada_env jupyter

You should then be ready to execute the tutorial. In any case, you should choose the Conda environment of Climada as Kernel when you run Jupyter.

Please report back if the problem persists when following these instructions. In this case, we might need to update the environment specifications.

sunt05 commented 1 week ago

Many thanks for the guidance above.

However, I now encounter a new error when running verification after installing climada, which should be addressed as numpy 2.0 has been formally released:

python -m unittest climada.engine.test.test_impact
Error message below (Click to expand)
``` Traceback (most recent call last): File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/__main__.py", line 18, in main(module=None) File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 100, in __init__ self.parseArgs(argv) File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 147, in parseArgs self.createTests() File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 158, in createTests self.test = self.testLoader.loadTestsFromNames(self.testNames, File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in loadTestsFromNames suites = [self.loadTestsFromName(name, module) for name in names] File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in suites = [self.loadTestsFromName(name, module) for name in names] File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 154, in loadTestsFromName module = __import__(module_name) File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/__init__.py", line 24, in from .util.config import CONFIG File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/util/__init__.py", line 22, in from pint import UnitRegistry File "/Users/tingsun/.local/lib/python3.9/site-packages/pint/__init__.py", line 28, in from .formatting import formatter, register_unit_format File "/Users/tingsun/.local/lib/python3.9/site-packages/pint/formatting.py", line 17, in from .babel_names import _babel_lengths, _babel_units File "/Users/tingsun/.local/lib/python3.9/site-packages/pint/babel_names.py", line 11, in from .compat import HAS_BABEL File "/Users/tingsun/.local/lib/python3.9/site-packages/pint/compat.py", line 166, in from xarray import DataArray, Dataset, Variable File "/Users/tingsun/.local/lib/python3.9/site-packages/xarray/__init__.py", line 1, in from . import testing, tutorial File "/Users/tingsun/.local/lib/python3.9/site-packages/xarray/testing.py", line 9, in from xarray.core import duck_array_ops, formatting, utils File "/Users/tingsun/.local/lib/python3.9/site-packages/xarray/core/duck_array_ops.py", line 35, in from . import dask_array_ops, dtypes, nputils File "/Users/tingsun/.local/lib/python3.9/site-packages/xarray/core/dask_array_ops.py", line 3, in from . import dtypes, nputils File "/Users/tingsun/.local/lib/python3.9/site-packages/xarray/core/dtypes.py", line 43, in {np.bytes_, np.unicode_}, # numpy promotes to unicode File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/numpy/__init__.py", line 397, in __getattr__ raise AttributeError( AttributeError: `np.unicode_` was removed in the NumPy 2.0 release. Use `np.str_` instead. ```
emanuel-schmid commented 1 week ago

Many thanks, @sunt05! This is unexpected, normally numpy 1.xx is bein installed. I suppose you can workaround by running

mamba install numpy=1.26

Meanwhile we have to investigate this and likely update the environment file. To speed this up, could you perhaps post the output of

mamba repoquery whoneeds numpy

here?

sunt05 commented 1 week ago

Please see below for the output of mamba repoquery whoneeds numpy:


Using local repodata...

Loaded current active prefix: "/Users/tingsun/micromamba/envs/climada_env"

No entries matching "numpy" found

numpy may not be installed. Try giving a channel with '-c,--channel' option for remote repoquery

From the above error message, it looks like the issue was caused by xarray.

sunt05 commented 1 week ago

Also, after downgrading numpy to 1.26, a new issue related to cartopy and shapely appeared when running the test python -m unittest climada.engine.test.test_impact:

Traceback (most recent call last):
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 100, in __init__
    self.parseArgs(argv)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 147, in parseArgs
    self.createTests()
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 158, in createTests
    self.test = self.testLoader.loadTestsFromNames(self.testNames,
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/__init__.py", line 24, in <module>
    from .util.config import CONFIG
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/util/__init__.py", line 26, in <module>
    from .coordinates import *
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/util/coordinates.py", line 32, in <module>
    from cartopy.io import shapereader
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/cartopy/__init__.py", line 106, in <module>
    import cartopy.crs  # noqa: E402  module-level imports
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/cartopy/crs.py", line 3020, in <module>
    Sinusoidal.MODIS = Sinusoidal(globe=Globe(ellipse=None,
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/cartopy/crs.py", line 2998, in __init__
    self._boundary = sgeom.LinearRing(points)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/shapely/geometry/polygon.py", line 104, in __new__
    geom = shapely.linearrings(coordinates)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/shapely/decorators.py", line 77, in wrapped
    return func(*args, **kwargs)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/site-packages/shapely/creation.py", line 171, in linearrings
    return lib.linearrings(coords, out=out, **kwargs)
RecursionError: maximum recursion depth exceeded while calling a Python object
emanuel-schmid commented 1 week ago

Er - sorry - but that doesn't look good. In particular the output of mamba repoquery is inexplicable. I'm afraid something went wrong with the installation. At this point I'd suggest to restart from scratch. Do cd <path/to/workspace> and run

cd climada_python
git checkout calibrate-impact-functions
git pull origin calibrate-impact-functions
mamba env remove -n climada_env
mamba create -n climada_env python=3.9
mamba env update -n climada_env -f requirements/env_climada.yml
mamba activate climada_env
python -m pip install -e "./[dev]"

If you get an error for any of the commands, please post the command and error output here. If all goes smoothly, run the mamba repoquery whoneeds numpy command again and post the output.

sunt05 commented 1 week ago

Thanks for updated instructions.

Please see below:

❯ mamba repoquery whoneeds numpy
Using local repodata...

Loaded current active prefix: "/Users/tingsun/micromamba/envs/climada_env"

 Name            Version  Build                    Depends     Channel   Subdir
────────────────────────────────────────────────────────────────────────────────
 bokeh           3.4.1    pyhd8ed1ab_0             conda-forge noarch   
 bottleneck      1.4.0    py39h37867e2_0           conda-forge osx-arm64
 cartopy         0.23.0   py39h998126f_1           conda-forge osx-arm64
 cfgrib          0.9.9.1  pyhd8ed1ab_2             conda-forge noarch   
 cftime          1.6.4    py39h161d348_0           conda-forge osx-arm64
 contourpy       1.2.1    py39h48c5dd5_0           conda-forge osx-arm64
 dask            2024.2.1 pyhd8ed1ab_0             conda-forge noarch   
 fiona           1.9.1    py39h9e5269e_0           conda-forge osx-arm64
 folium          0.17.0   pyhd8ed1ab_0             conda-forge noarch   
 gdal            3.6.2    py39h766d3fc_6           conda-forge osx-arm64
 h5py            3.8.0    nompi_py39hc9149d8_100   conda-forge osx-arm64
 mapclassify     2.6.1    pyhd8ed1ab_0             conda-forge noarch   
 matplotlib-base 3.8.4    py39h15359f4_2           conda-forge osx-arm64
 netcdf4         1.6.2    nompi_py39h8ded8ba_100   conda-forge osx-arm64
 numba           0.60.0   py39h2d4ef1e_0           conda-forge osx-arm64
 numexpr         2.10.0   py39h998126f_0           conda-forge osx-arm64
 pandas          2.1.4    py39hf8cecc8_0           conda-forge osx-arm64
 patsy           0.5.6    pyhd8ed1ab_0             conda-forge noarch   
 pyarrow         12.0.1   py39hf40061a_7_cpu       conda-forge osx-arm64
 pytables        3.7.0    py39h8abd629_3           conda-forge osx-arm64
 python-eccodes  1.5.1    py39h4d8bf0d_0           conda-forge osx-arm64
 rasterio        1.3.6    py39h157378c_0           conda-forge osx-arm64
 salib           1.5.0    pyhd8ed1ab_0             conda-forge noarch   
 scikit-learn    1.5.0    py39h3c33c8b_1           conda-forge osx-arm64
 scipy           1.13.1   py39h3d5391c_0           conda-forge osx-arm64
 seaborn-base    0.13.2   pyhd8ed1ab_2             conda-forge noarch   
 shapely         2.0.1    py39h472ea82_0           conda-forge osx-arm64
 snuggs          1.4.7    py_0                     conda-forge noarch   
 sparse          0.15.4   pyhd8ed1ab_0             conda-forge noarch   
 statsmodels     0.14.2   py39h161d348_0           conda-forge osx-arm64
 xarray          2024.6.0 pyhd8ed1ab_1             conda-forge noarch   
(climada_env) 
sunt05 commented 1 week ago

BTW, now the issue comes from pandas when running the test python -m unittest climada.engine.test.test_impact:

Traceback (most recent call last):
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 100, in __init__
    self.parseArgs(argv)
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 147, in parseArgs
    self.createTests()
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/main.py", line 158, in createTests
    self.test = self.testLoader.loadTestsFromNames(self.testNames,
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 220, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/Users/tingsun/micromamba/envs/climada_env/lib/python3.9/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/__init__.py", line 24, in <module>
    from .util.config import CONFIG
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/util/__init__.py", line 26, in <module>
    from .coordinates import *
  File "/Users/tingsun/Dropbox (Personal)/Mac/Downloads/20240617-climada/climada_python/climada/util/coordinates.py", line 33, in <module>
    import dask.dataframe as dd
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/__init__.py", line 4, in <module>
    from dask.dataframe import backends, dispatch, rolling
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/backends.py", line 21, in <module>
    from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/core.py", line 35, in <module>
    from dask.dataframe import methods
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/methods.py", line 22, in <module>
    from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/utils.py", line 19, in <module>
    from dask.dataframe import (  # noqa: F401 register pandas extension types
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/_dtypes.py", line 3, in <module>
    from dask.dataframe.extensions import make_array_nonempty, make_scalar
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/extensions.py", line 6, in <module>
    from dask.dataframe.accessor import (
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/accessor.py", line 190, in <module>
    class StringAccessor(Accessor):
  File "/Users/tingsun/.local/lib/python3.9/site-packages/dask/dataframe/accessor.py", line 276, in StringAccessor
    pd.core.strings.StringMethods,
AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'
(climada_env) 

Also the info about pandas:

pandas                     2.1.4         py39hf8cecc8_0          conda-forge
peanutfun commented 1 week ago

@sunt05 This still does not look right. Note that the traceback shows that the dask module loaded by your interpreter is located in /Users/tingsun/.local/lib/python3.9/site-packages/dask/ (your default interpeter location, I presume), and not in the Mamba environment /Users/tingsun/micromamba/envs/climada_env.

It seems you are using Micromamba, and we have no experience with its particular behavior. Can you switch to Conda or Mamba?

sunt05 commented 1 week ago

I see - thanks for pointing that out - will switch now and update you shortly.

sunt05 commented 1 week ago

Thanks for the above instructions - I just managed to install the package and run the entire impact notebook.

One suggestion for the installation instruction is to change the recommended Python version from 3.9 to 3.11, as the latter has better compatibility for ARM64 Mac. Python 3.9 has various issues due to several packages lacking ARM libraries for legacy reasons.

I'll now close this issue and move forward with the review task.

emanuel-schmid commented 1 week ago

👍 Glad it worked out. Thanks for the suggestion. We wanted to do that anyway, eventually. Considering your input this should happen rather sooner than later.

peanutfun commented 1 week ago

@sunt05 Thanks for the suggestion. We are currently blocked from updating to Python 3.12, see #868 #870. But it might be worthwhile to immediately jump to Python 3.11 as the supported version. @emanuel-schmid, this might then also bring us better in line with the Euler cluster setup. We can discuss that in the next dev meeting