FlineDev / CSVImporter

Import CSV files line by line with ease
MIT License
148 stars 31 forks source link

Reporting parsing errors #32

Closed JetForMe closed 6 years ago

JetForMe commented 6 years ago

Is there any way to skip a row or report a parsing error (aborting parsing) if a field can't be parsed? Can I return an optional mapped record?

Jeehut commented 6 years ago

Rows are skipped automatically if they don't comply to the structure defined in the RFC for CSV files. For example, if the first row has 8 values, then each row has to have the exact same amount of rows. If one row doesn't, it will simply be ignored. The standard for CSV files doesn't allow any row to have a different number of fields, so the source file then is invalid and CSVImporter currently doesn't have support for anything else than valid CSV files.

But I'm open for a PR that adds a callback to get more info about successes of parsing single files as long as it's tested and documented well.

JetForMe commented 6 years ago

In my case, I'm parsing the strings in the fields as dates, integers, floats, etc. If that parsing fails (say, due to a malformed entry in the field), I want to be able to return that error, and either have it skip the row, or abandon the import. Ideally I could include information about the line and position in the line where the failed field is.

Jeehut commented 6 years ago

In that case, just implement this yourself? I mean, it's your parsing that fails, so you should also catch the errors, no? Or am I misunderstanding something?

JetForMe commented 6 years ago

Yes, you're misunderstanding. I'm in the middle of the parse loop, in that first closure shown in your instructions. If something in there fails, there's no way to stop the import, or skip the import. I must return a constructed object. I'm not even sure I can throw an exception out of it.

Jeehut commented 6 years ago

Yeah, but that‘s actually what I was thinking of: If you‘re not sure that every line is such an object, you could simply define your expected result type to be an Optional. And if you also want to report any errors, just use something like the Result.

Since the type CSVImporter is returning can be set to anything, it allows you to define your level of gracefully handling missing/invalid data yourself. Only if the CSV file doesn‘t comply to the standard, then CSVImporter doen‘t handle the line gracefully yet, giving you the option to still recover some part of the data. For this, we would need an update, but what you described til now should be already possible as described above. Just change your Generic type. ;)

ambujpunn commented 6 years ago

@JetForMe I agree with @Dschee here. I did the same thing. To prevent invalid entries, I just have my mapped structure take an ? value. So instead of Element you can have the CSVImporter take in a generic value of Element?. Simply have a guard in the beginning of the first closure and optionally unwrap all the variables. If anything fails, return nil and it will add nil in the array of mapped objects.

Jeehut commented 6 years ago

Closing this due to inactivity and an existing workaround. Feel free to open a PR if you want this to be part of CSVImporter.