Closed AlexDo1 closed 3 years ago
Yes. That is a good spot. Great!
Does switching to Numeric solve the issue?
I just switched to Numeric
and it does not solve the issue, the datatype is still Decimal
.
Should we define the dtypes in reader.py like we did in importer.py?
Yes. As far as I know, the Decimal is the only problem here, the DateTime should convert smoothly. I can remember, that I found a coversion function in sqlalchemy or pandas, which does exactly this one day. Can't really remember....
The problem is solved by adding the dtype
parameter to pd.DataFrame()
here:
https://github.com/VForWaTer/metacatalog/blob/15ee52f0e7b964e0b1df19132696cbe06a208bbe/metacatalog/ext/io/reader.py#L64
df = pd.DataFrame(data=raw, columns=col_names, dtype=np.float64,index=df_sql.index)
This way, the columns of the exported Dataframe are of type float64
.
Alright, then we go for this. It will convert integer-based fields as well and take more space than necessary in these cases, but that does not really matter. If we run into performance issues, we can come back to this issue.
I just exported eddy data from metacatalog. I wanted to calculate some values like the minimum to check the data:
edat['u'].min()
which leads to the following error:InvalidOperation: [<class 'decimal.InvalidOperation'>]
All values in the exported data frame are of type
decimal.Decimal
, it seems that pandas cannot perform operations like.min()
and.max()
on this data type.I would have to convert the series to type
float
to calculate min and max.I don`t think this is ideal, as metacatalog should work with pandas smoothly.
dtypes for the imported data in metacatalog are defined here: https://github.com/VForWaTer/metacatalog/blob/15ee52f0e7b964e0b1df19132696cbe06a208bbe/metacatalog/ext/io/importer.py#L126-L131
Could we use a datatype like
'data': ARRAY(sa.NUMERIC)
to solve this issue?