ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

Fix bug for entirely empty chunks of rows #83

Closed jacobcook1995 closed 1 year ago

jacobcook1995 commented 1 year ago

This PR adds a check that the set of valid dates isn't empty before attempting to use this to update the temporal bounds. This comes from a weird edge where a chunk of rows contains entirely "NA" and so every single entry is ignored

I haven't added a test to demonstrate that this works, if you think of an appropriate one let me know.

davidorme commented 1 year ago

I guess a test would be to have an in-memory workbook with a couple of NA rows and then set the chunksize to 1 or 2 to keep it small?

codecov-commenter commented 1 year ago

Codecov Report

Merging #83 (760b6be) into release/3.0.0 (b5b021a) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@                Coverage Diff                @@
##           release/3.0.0      #83      +/-   ##
=================================================
+ Coverage          69.00%   69.04%   +0.04%     
=================================================
  Files                 12       12              
  Lines               3639     3644       +5     
=================================================
+ Hits                2511     2516       +5     
  Misses              1128     1128              
Files Changed Coverage Δ
safedata_validator/field.py 93.52% <100.00%> (+0.04%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jacobcook1995 commented 1 year ago

Added a test for "NA" data entries for the DatetimeField. I had to change how the consistency checking is done as this was producing "Date or datetime data mixes ISO string and time formatted rows" errors when it shouldn't.

I also added a similar test + change to the consistency checking for the TimeField as this field uses a very similar logic.

Considered adding similar tests for the other fields but that felt like a real rabbit hole for an edge case.