Closed janetriley closed 11 months ago
The advantage of creating all category directories at the start is it raises an exception immediately if there's an issue creating the directory, rather than throwing it in the midst of an export. That seems like a worthwhile tradeoff once you have a big corpus.
What do you think, @bbengfort , back it out?
I'm not thrilled with raising the ExportException for FileNotFound in the midst of the loop. It seems odd to single that one exception type out. I can:
What's your preference?
I'd say go ahead and create all the directories at the beginning - an empty directory is no big deal, and I think you're right that we don't want to throw an exception in the middle of the export.
Which brings us to the exceptions that are thrown in the middle of the export. A couple of points:
Any exception that is not of type BaleenError
should be considered a fatal developer error. I think that's why the FileNotFound
exception is being caught and wrapped in ExportError
because any other type of IOError should simply cause the program to crash and the developers have to deal with it as a bug (which you expressed in your point about so many ways for I/O to go wrong).
At this point though Baleen should be mature enough to gracefully handle exceptions, we've been running it long enough and haven't observed too much craziness.
For now, I would say that we should catch and reraise any exception as an ExportError, since they are being handled in a specific way and require the errno and other fields.
In the future, we could collect all errors in exporter.errors and show them to the user after export (or at least classify non fatal vs fatal export exceptions)
Speaking of specific error handling and fatal vs. non fatal - most of that happens at the logger level.