IQDM / IQDM-Analytics

Code to Analyze Data Mining Results
Other
1 stars 1 forks source link

Date parsing failure in widen_data causes GUI to crash #6

Closed cutright closed 3 years ago

cutright commented 3 years ago

A user is reporting the following error. Some SNC Patient (pre-2020) reports are swapping month and day.

2021-03-10 16:00:06,219 - iqdma - ERROR - Unhandled exception: Traceback (most recent call last):
  File "iqdma\main.py", line 552, in on_refresh
  File "iqdma\main.py", line 575, in import_csv
  File "iqdma\main.py", line 585, in update_report_data
  File "iqdma\stats.py", line 32, in __init__
  File "iqdma\importer.py", line 121, in __call__
  File "iqdma\utilities_dvha_stats.py", line 270, in widen_data
  File "iqdma\utilities_dvha_stats.py", line 357, in str_arr_to_date_arr
  File "iqdma\utilities_dvha_stats.py", line 352, in str_arr_to_date_arr
  File "dateutil\parser\_parser.py", line 1374, in parse
  File "dateutil\parser\_parser.py", line 652, in parse
dateutil.parser._parser.ParserError: String does not contain a date: 

Should update widen_data to handle flipped dates, but ideally we can detect the swapping from the report itself in IQDM-PDF.

cutright commented 3 years ago

Might move to this library: https://github.com/scrapinghub/dateparser

cutright commented 3 years ago

I think the solution is to have IQDM-PDF write file creation dates into its CSV output, then have IQDMA default to that on date parsing failure (or maybe try a day first mode first). dateparser is definitely slower than python-dateutil

cutright commented 3 years ago

Flipping day and month does not cause the error above:

>>> from dateutil.parser import parse as date_parser
>>> date_parser('3/30/2020')
    datetime.datetime(2020, 3, 30, 0, 0)
>>> date_parser('30/3/2020')
    datetime.datetime(2020, 3, 30, 0, 0)
cutright commented 3 years ago

If not fixed in d7d6bcbbb472fffb59a522155a108cb6cdc9711d, appears to be fixed in e0f632f271a13102e437c182a9ac7ff86c2e72b5

cutright commented 3 years ago

Reproduced the issue with SNCPatient2020 results from IQDM-PDF. Occurs with the date_col is empty for a row of data. Since there is a file creation timestamp now, code should default to this when no date data was parsed.