ClimateImpactLab / metacsv

Tools for documentation-aware data reading, writing, and analysis
https://metacsv.readthedocs.io/en/latest/
MIT License
2 stars 2 forks source link

MetaCSV incompatible with pandas 1.0? #34

Closed jrising closed 3 years ago

jrising commented 3 years ago

I am trying to update my code that uses metacsv to later versions of Pandas. I have a metacsv object-- let's call it df:

>>> df
<metacsv.core.containers.DataFrame (35525, 5)>
       year model scenario  iso         value
0      2010   low     SSP1  ARG  12108.176269
1      2015   low     SSP1  ARG  14444.632166
2      2020   low     SSP1  ARG  17277.942084
3      2025   low     SSP1  ARG  19868.468433
4      2030   low     SSP1  ARG  22289.947740
...     ...   ...      ...  ...           ...
35520  2080  high     SSP5  ZWE  39437.766931
35521  2085  high     SSP5  ZWE  48780.025690
35522  2090  high     SSP5  ZWE  59399.226110
35523  2095  high     SSP5  ZWE  71393.615155
35524  2100  high     SSP5  ZWE  84706.589006

[35525 rows x 5 columns]

Variables
    iso:       Country ISO3 [str]
    value:     GDP per capita [2005 PPP USD]
Attributes
    oneline:        GDP per capita (2005 PPP) (SSPs)
    version:        GCP-GDPPC-SSP.2018-08-14
    dependencies:   GDPPC-SSP.2016-02-15
    description:    Generated by socioeconomics/baselines/income_merged_noh...

If I try to access the model column, I get an error:

>>> df.model
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jrising/research/gcp/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5464, in __getattr__
    return self[name]
  File "/home/jrising/research/gcp/env/lib/python3.7/site-packages/pandas/core/frame.py", line 2996, in __getitem__
    return self._get_item_cache(key)
  File "/home/jrising/research/gcp/env/lib/python3.7/site-packages/pandas/core/generic.py", line 3794, in _get_item_cache
    res = self._box_col_values(values, loc).__finalize__(self)
  File "/home/jrising/research/gcp/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5433, in __finalize__
    self.attrs[name] = other.attrs[name]
  File "/home/jrising/research/gcp/env/lib/python3.7/site-packages/MetaCSV-0.1.1-py3.7.egg/metacsv/core/internals.py", line 107, in __getitem__
    return self._data[key]
KeyError: ('oneline', 'GDP per capita (2005 PPP) (SSPs)')

But, I can convert it to a pandas object and then access the column:

>>> df.to_pandas()
       year model scenario  iso         value
0      2010   low     SSP1  ARG  12108.176269
1      2015   low     SSP1  ARG  14444.632166
2      2020   low     SSP1  ARG  17277.942084
3      2025   low     SSP1  ARG  19868.468433
4      2030   low     SSP1  ARG  22289.947740
...     ...   ...      ...  ...           ...
35520  2080  high     SSP5  ZWE  39437.766931
35521  2085  high     SSP5  ZWE  48780.025690
35522  2090  high     SSP5  ZWE  59399.226110
35523  2095  high     SSP5  ZWE  71393.615155
35524  2100  high     SSP5  ZWE  84706.589006

[35525 rows x 5 columns]
>>> df.to_pandas().model
0         low
1         low
2         low
3         low
4         low
         ... 
35520    high
35521    high
35522    high
35523    high
35524    high
Name: model, Length: 35525, dtype: object
jrising commented 3 years ago

@brews This is the current barrier to updating pandas past 0.25.3.