CorrelAid / pystatis

MIT License
8 stars 1 forks source link

Parse data types correctly #106

Open bergnerjonas opened 3 months ago

bergnerjonas commented 3 months ago

Currently, all date columns are not parsed to a Date type, but rather as a generic Object type. E.g. the column "Stichtag" in "44231-01-02-4". This could be implemented by explicitly converting dates in the read_csv call: https://github.com/CorrelAid/pystatis/blob/52900860430b3ec960cd100e299cfa99a38d4aef/src/pystatis/table.py#L95
This needs to identify the date columns somehow and adjust the date format according to the requested language, since the API response changes accordingly.

pmayd commented 3 months ago

That is true for all columns and one of the open issues or next steps so as soon as we have a reliable output format and cleaned data, we should try to return the correct data type. Value columns for example should probably always be numbers . And if course dates should be returned as dates

For dates it should be really easy as they can only appear in the time column, right? So we only need a very short dictionary or list with possible date column names and parse them accordingly

pmayd commented 2 months ago

I implemented date parsing for time column, but we have a strange problem with year (Jahr), although it is only numbers the data type is for some reason object. We should check this again before closing