Parameters with more than one index (e.g. region, region) cause a read error

willu47 commented 1 year ago

If a config contains a parameter which contains a duplicate index, such as

TradeRoute:
    indices: [REGION,REGION,FUEL,YEAR]
    type: param
    dtype: float
    default: 0

then an error is raised when reading in the corresponding csv file

NotImplementedError                       Traceback (most recent call last)
Cell In[2], line 24
     20 validate_config(config)
     22 read_strategy = ReadCsv(user_config=config)
---> 24 model, defaults = read_strategy.read(folder_path)
     25 logging.debug(model.keys())

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/otoole/read_strategies.py:209, in ReadCsv.read(self, filepath, **kwargs)
    207 if entity_type == "param":
    208     df = self._get_input_data(filepath, parameter, details, converter)
--> 209     narrow = self._check_parameter(df, details["indices"], parameter)
    210     if not narrow.empty:
    211         narrow_checked = check_datatypes(
    212             narrow, self.user_config, parameter
    213         )

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/otoole/read_strategies.py:91, in _ReadTabular._check_parameter(self, df, expected_headers, name)
     87         logger.warning("%s not in header of %s", column, name)
     89 logger.debug("Final all headers for %s: %s", name, all_headers)
---> 91 return narrow[all_headers].set_index(expected_headers)

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
...
    420     values = sanitize_array(values, None)
    421 else:
    422     # i.e. must be a list

NotImplementedError: > 1 ndim Categorical are not supported at this time

I'm using pandas v1.5.3 and otoole v1.0

trevorb1 commented 1 year ago

Also relates to issue #130

willu47 commented 1 year ago

To reproduce this:

import pandas as pd

data = [
    ['REGIONA', 'REGIONB', 2010, 1],
    ['REGIONA', 'REGIONB', 2020, 2],
    ['REGIONB', 'REGIONA', 2010, 2],
    ['REGIONB', 'REGIONA', 2020, 2],
]
df = pd.DataFrame(data, columns=['REGION', 'REGION', 'YEAR', 'VALUE'])

df.set_index(['REGION', 'REGION', 'YEAR'])

returning the error:

NotImplementedError                       Traceback (most recent call last)
wusher/repository/otoole/Categorical) Index.ipynb Cell 25 in ()
----> [1](vscode-notebook-cell:/Users/wusher/repository/otoole/Categorical%20Index.ipynb#X35sZmlsZQ%3D%3D?line=0) df.set_index(['REGION', 'REGION', 'YEAR'])

File ~/miniconda3/envs/otoole38/lib/python3.9/site-packages/pandas/util/_decorators.py:311), in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/miniconda3/envs/otoole38/lib/python3.9/site-packages/pandas/core/frame.py:5555), in DataFrame.set_index(self, keys, drop, append, inplace, verify_integrity)
   5547     if len(arrays[-1]) != len(self):
   5548         # check newest element against length of calling frame, since
   5549         # ensure_index_from_sequences would not raise for append=False.
   5550         raise ValueError(
   5551             f"Length mismatch: Expected {len(self)} rows, "
   5552             f"received array of length {len(arrays[-1])}"
   5553         )
-> 5555 index = ensure_index_from_sequences(arrays, names)
   5557 if verify_integrity and not index.is_unique:
   5558     duplicates = index[index.duplicated()].unique()
...
    417     values = sanitize_array(values, None)
    418 else:
    419     # i.e. must be a list

NotImplementedError: > 1 ndim Categorical are not supported at this time

We can check for duplicate columns using df.columns.is_unique.

Some relevant reading material from the pandas docs:

Dealing with duplicate labels

OSeMOSYS / otoole

Parameters with more than one index (e.g. region, region) cause a read error #153