coa-project / pycoa

pycoa Python source code
MIT License
18 stars 0 forks source link

govcy db : strange behaviour in cumul data #193

Closed tjbtjbtjb closed 12 months ago

tjbtjbtjb commented 1 year ago

See the graph below :

image

b0gz1b commented 1 year ago

https://github.com/coa-project/pycoa/blob/b6d98b7ef7ca8a07cc8bce89974cfbc46a9ecf8c/coa/covid19.py#L96 should be rename_dict = {'total deaths': 'tot_deaths'} but the issue seems to be fixed in the ReadingDataModification branch

b0gz1b commented 1 year ago

Issue is not fixed in ReadingDataModification due to the line : https://github.com/coa-project/pycoa/blob/8617105db6b812d292aab28f800fd0c0147d391e/coa/dbparser.py#L1006 It is caused by infer_datetime_format = True which tries to infer the format from the first date of the column. In this case as the first date is 9/3/2020, it defaults to MM-DD-YYYY, instead of the correct DD-MM-YYYY. Suggested fix would be to include the format of the date in the base metadata and use this format to convert. It would prevent further bugs on other databases.

However the current more simple fix I have is just to replace the line by :

if self.db == "govcy":
    pandas_db['date'] = pd.to_datetime(pandas_db['date'], errors='coerce', format="%d/%m/%Y").dt.date
else:
    pandas_db['date'] = pd.to_datetime(pandas_db['date'], errors='coerce', infer_datetime_format=True).dt.date
odadoun commented 12 months ago

fixed using Jules trick