fact-project / photon_stream

Explore the novel photon stream, based on the single photon extractor
6 stars 2 forks source link

Skip empty *.jsonl lines instead of stopping at the first empty line #91

Closed mirkobunse closed 2 years ago

mirkobunse commented 3 years ago

The JSONLines reader is currently stopping at the first empty line in a .jsonl file. Indeed, I encountered empty lines that ERNA (or fact-tools?) randomly added to an otherwise reproducible .jsonl file.

The issue stems from checking

line = self.fin.readline().strip().rstrip(',')
if not line:
    raise StopIteration

because readline() will produce an empty string not only at the end of the file, but also for any line that is empty.

Replacing self.fin.readline() with self.fin.__next__() resolves the issue by producing an empty string only for lines that are empty indeed - and a StopIteration only at the actual end of the file. This PR will skip empty lines and only stop at the actual end of a JSONLines file.

mirkobunse commented 3 years ago

Thanks for approving!

On the other hand, I do not see a big problems when the photon-stream-reader is more forgiving.

Let me add one argument in favor of a "more forgiving" reader and against fixing empty lines in the file itself: yes, every empty line is valid JSON. But an empty line is not a valid photon_stream JSON item; it would cause later exceptions due to missing keys.