Closed jgunstone closed 1 month ago
Thanks for the report. I can reproduce.
I noticed a weird behaviour, where adding a string, boolean or decimal number column results in the correct output, but a dataframe with any number of only integer columns triggers the reported bug.
yh - just to demonstrate your point, this works:
>>> import pandas as pd
>>> x = range(0, 5)
>>> y = [_**2 for _ in x]
>>> z = [_*1.2 for _ in x]
>>> df = pd.DataFrame({"x": x, "y": y, "z": z})
>>> from frictionless import Resource
>>> r = Resource(df)
>>> r.read_rows()
[{'x': 0, 'y': 0, 'z': Decimal('0.0')},
{'x': 1, 'y': 1, 'z': Decimal('1.2')},
{'x': 2, 'y': 4, 'z': Decimal('2.4')},
{'x': 3, 'y': 9, 'z': Decimal('3.5999999999999996')},
{'x': 4, 'y': 16, 'z': Decimal('4.8')}]
Some exploration notes :
np.int64(...)
values as cellscreate_value_reader
(l34 of frictionless/fields/integer.py) does not handle properly np.integer type as isinstance(cell, int)
returns False.I have not looked why adding another dtype in the dataframe solves the issue, it is probably triggering a conversion somewhere.
EDIT : further observations
df_int = pd.Series([1, 2])
print(type(df_int[0]))
// <class 'numpy.int64'>
df_mixed = pd.Series([1, "a"])
print(df_mixed.dtypes )
// object
print(type(df_mixed[0]))
// <class 'int'>
df.iterrows()
return pd.Series
. Mixed series have always dtype('O')
python object type, and types are coerced to python types. Homogenous type series keep there numpy types, which are not always instances of python types. It is in particular the case for np.int64
and np.True_/False_
, whereas strings and floats are accepted as instances of python's str
and float
.
following the docs: https://framework.frictionlessdata.io/docs/formats/pandas.html
to reproduce: