ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

Incorrect reporting of dataset field metadata with trailing empty fields #60

Closed davidorme closed 1 year ago

davidorme commented 1 year ago

Processing a worksheet creates an EmptyField instance for columns when the worksheet max_rows extends beyond the data table. These trailing EmptyField instances are then used to validate data rows by looking for invalid content. If there is none, then the worksheet is just reading in extra fields because it does sometimes 🤷 .

However at the moment, the number of rows and the row metadata being written to the dataset metadata include these empty fields. These need to be removed/updated.

I think the best place to do this is in the Dataworksheet.to_dict method, rather than heavily updating the class itself: we just want the metadata to be accurate and the internal representation of the worksheet is best left close to the actual source worksheet.