Closed mirkobunse closed 2 years ago
Thanks for approving!
On the other hand, I do not see a big problems when the photon-stream-reader is more forgiving.
Let me add one argument in favor of a "more forgiving" reader and against fixing empty lines in the file itself: yes, every empty line is valid JSON. But an empty line is not a valid photon_stream JSON item; it would cause later exceptions due to missing keys.
The JSONLines reader is currently stopping at the first empty line in a .jsonl file. Indeed, I encountered empty lines that ERNA (or fact-tools?) randomly added to an otherwise reproducible .jsonl file.
The issue stems from checking
because
readline()
will produce an empty string not only at the end of the file, but also for any line that is empty.Replacing
self.fin.readline()
withself.fin.__next__()
resolves the issue by producing an empty string only for lines that are empty indeed - and aStopIteration
only at the actual end of the file. This PR will skip empty lines and only stop at the actual end of a JSONLines file.