ABI-Covid-19 / moh-data

Apache License 2.0
0 stars 1 forks source link

Changes to MoH server breaks things #5

Closed agarny closed 4 years ago

agarny commented 4 years ago

It looks like the MoH changed things on their server... Argh!

$ python
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from moh_data.main import Basic
>>> run_data = Basic()
Traceback (most recent call last):
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 432, in _convert_listlike_datetimes
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/moh_data/main.py", line 29, in __init__
    self._run()
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/moh_data/main.py", line 34, in _run
    self._total_daily_confirmed = self._excel_file.get_daily_sum_confirmed()
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/moh_data/core/collector.py", line 43, in get_daily_sum_confirmed
    self._confirmed_total = self._get_custom_sum(self._confirmed_sheet, 'Date of report', 'Daily confirmed cases')
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/moh_data/core/collector.py", line 116, in _get_custom_sum
    total.index = pd.to_datetime(total.index, format="%d/%m/%Y")
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 738, in to_datetime
    result = convert_listlike(arg, format)
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 435, in _convert_listlike_datetimes
    raise e
  File "/Users/Alan/virtualenv/moh-data/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 400, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas/_libs/tslibs/strptime.pyx", line 142, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '2020-04-18 00:00:00' does not match format '%d/%m/%Y' (match)
mahyar-osn commented 4 years ago

Yes, I saw this just a little ago. Gonna fix this now. Also, will probably switch to a more stable database (Johns Hopkins maybe) because it seems that MoH keeps changing the format when they enter the data to the spreadsheet.

agarny commented 4 years ago

Yes, this is very annoying that they keep changing things like that. Re the JHU data, is it really 100% the same as the one from the MoH? I mean that I believe to have, in the past, seen some minor discrepancies...? Anyway, use whatever data stream you feel is best and works without having to tweak it every couple of days.

mahyar-osn commented 4 years ago

Not sure exactly about that. I have to investigate and see if there are discrepancies. Alternatively, once we are connected with Shaun Hendy, he will hopefully give us access to their ESR dashboard for a complete (and hopefully stable) data feed.

agarny commented 4 years ago

Ok, probably best to wait until you have had that meeting with Shaun. Until then, do you think it would take you long to fix the script? I was hoping to make use the cumulative number of confirmed/probable cases in the SEIR model... No worries if you have more urgent things to do, that will "force" me to enjoy my lockdown weekend... :)

mahyar-osn commented 4 years ago

Sure. Just fixed it now. Will make a PR right away.

agarny commented 4 years ago

Damn, here goes my enjoying-my-weekend. :) (Thanks though! :))

mahyar-osn commented 4 years ago

oops sorry for that! :))