RSE-Sheffield / sheffield-covid-19

Scrape COVID-19 data from the University website
1 stars 8 forks source link

Error? : "invalid literal for int() with base 10: '1*' " #14

Closed muneerahp closed 3 years ago

muneerahp commented 3 years ago

Fairly sure this is a bug, but just in case, since this is the first time I'm running this:

Ran, and got:

Traceback (most recent call last):
  File "code/ingest.py", line 220, in <module>
    main()
  File "code/ingest.py", line 71, in main
    data = transform(validated)
  File "code/ingest.py", line 106, in transform
    out.extend(int(x) for x in row[1:])
  File "code/ingest.py", line 106, in <genexpr>
    out.extend(int(x) for x in row[1:])
ValueError: invalid literal for int() with base 10: '1*'

Checked the debugger, and it looks like the "rows" in transform has: ['Monday 28 September', '1*', '19*'] as part of the list.

Checked the website, and the table this data looks like it was pulled from also has said stars in the values. Happy to add a bugfix (some kind of type check/conversion before applying any operations?) if this is confirmed to be an actual bug for everybody, and not some weird first run issue for me.

drj11 commented 3 years ago

Thanks for reporting this.

As you can see from wayback machine https://web.archive.org/web/20201006082650/https://www.sheffield.ac.uk/autumn-term-2020/covid-19-statistics the * for a footnote has been added. sigh

I suggest extending validate() so that it specifically checks and removes a single final star in the cell string. It's good to be conservative, because we'd rather fail than accept something that was wrong.

drj11 commented 3 years ago

closed by #15