ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

`mypy` fixes + improved CI procedure #59

Closed jacobcook1995 closed 1 year ago

jacobcook1995 commented 1 year ago

This pull request contains a large number of fixes to stop mypy complaining.

With these fixed it also enables mypy as a pre-commit hook, and the CI procedure is change so that pre-commit hooks are run across the repo routinely. This should help us keep the code more in line with mypy in future.

I've also added a step to calculate and upload code coverage. This includes a badge in the README.md, which currently doesn't display anything as it's linked to the develop branch (though maybe this should be master?) which at present doesn't have any code coverage uploaded.

codecov-commenter commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (release/3.0.0@584e92d). Click here to learn what that means. The diff coverage is n/a.

@@               Coverage Diff                @@
##             release/3.0.0      #59   +/-   ##
================================================
  Coverage                 ?   67.64%           
================================================
  Files                    ?       12           
  Lines                    ?     3576           
  Branches                 ?        0           
================================================
  Hits                     ?     2419           
  Misses                   ?     1157           
  Partials                 ?        0           

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jacobcook1995 commented 1 year ago

There seems to be a test failure in test/test_resources:: test_load_resources_by_arg, but only for python 3.11 on windows OS.

jacobcook1995 commented 1 year ago

Hmmm the problem seems to be that in the python 3.11 + Windows OS case gbif_file can be read as a csv file, whereas for every other python version x OS combo it correctly raises a "Can't be read as .csv" error. No idea why that would happen though?

davidorme commented 1 year ago

Yeah - that Windows failure is odd. The testing here is a bit tricky - what happens when a csv reader tries to ingest data from a non-csv file? You can get blatantly odd byte values (Unicode errors), or csv can complain, but you could have a non-csv file starting with byte data that could conceivably be header text.

So I think what has happened is that in this test, the reader is supposed to encounter a binary file (the sqlite input) and fail to get text, but something about the Windows system (text codec?) reads it as text that doesn't contain the right headers. And then that emits the wrong log message.

I think the answer is probably to collapse the "can't read text" and "wrong headers" cases into "nope". That more accurately reflects the level of discrimination we can do!