When pandas.DataFrame.resample is used grouping by one or more integer columns and also resetting the index, an error arises because the integer columns already exist in the re-sampled dataframe before resetting the index.
What I Did
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({
...: 'time': ['2010-01-01', '2010-01-02', '2010-01-03'],
...: 'str_id': ['a', 'b', 'c'],
...: 'int_id': [1, 2, 3],
...: 'value': [1, 2, 3]
...: })
In [3]: df['time'] = pd.to_datetime(df['time'])
In [4]: from mlblocks import MLBlock
In [5]: block = MLBlock('pandas.DataFrame.resample', rule='1D', on='time',
...: groupby=['str_id'], aggregation='mean', reset_index=True)
In [6]: block.produce(X=df)
Out[6]:
str time int value
0 a 2010-01-01 1 1
1 b 2010-01-02 2 2
2 c 2010-01-03 3 3
In [7]: block = MLBlock('pandas.DataFrame.resample', rule='1D', on='time',
...: groupby=['int_id'], aggregation='mean', reset_index=True)
In [8]: block.produce(X=df)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-2f66b517b674> in <module>
----> 1 block.produce(X=df)
...
1147 if not allow_duplicates and item in self.items:
1148 # Should this be a different kind of error??
-> 1149 raise ValueError('cannot insert {}, already exists'.format(item))
1150
1151 if not isinstance(loc, int):
ValueError: cannot insert int_id, already exists
Description
When
pandas.DataFrame.resample
is used grouping by one or more integer columns and also resetting the index, an error arises because the integer columns already exist in the re-sampled dataframe before resetting the index.What I Did