MarkPflug / Sylvan.Data.Excel

The fastest .NET library for reading Excel data files.
MIT License
243 stars 30 forks source link

[Help needed] Cell validation with error reporting, worksheet schema & code review #175

Closed sherman89 closed 5 months ago

sherman89 commented 5 months ago

First of all thank you @MarkPflug for this fantastic library ❤️

The main struggle I have is how to cleanly deal with validation and reporting which worksheet, column and cell has invalid data. I got this to work, but the code seems kinda ugly and I'm unable to get the built-in (preview) validation handler to work. Also for some reason the reader that is returned by Validate does not work.. I might need a good nights sleep at this point.

It would be nice to be able to validate each cell as it's being read, and then store the error with cell position, column name and worksheet name. Is this somehow easily possible? 🤔

There is also more complex validation involving checking for duplicate values, and checking that at least 1 of 2 cells has a value, but I only need it in a couple of columns in the whole document so perhaps I can do that after the reading is done as I'm doing right now in my test app... would appreciate some tips here as well though!

Another minor issue is that I could not use Schema.Parse with multiple worksheets unless I have a separate schema and reader per worksheet, but I just made a custom schema class and it works fine and allows me to do some custom logic like validating that column name is not empty, etc...

I've attached a repo with a test console application and test data and I was hoping for some advice on how to cleanly implement these: https://github.com/sherman89/SylvanExcelTest

PS: I have a requirement to support localized versions of the document (same structure though) so I'm mapping worksheets to internal worksheet names.

Thank you!

MarkPflug commented 5 months ago

I've sent a PR to your SylvanExcelTest repo that I think covers most of your usage questions. The DbDataReader.Validate API clearly could use some documentation, as it isn't very intuitive currently. At the very least, I think DataValidationContext could use an IsValid(int ordinal) method, to assist in custom validation handling. I'm going to close this issue. Let's continue the discussion on the PR in your repo.