Leibniz-HBI / dabapush

Data Base pusher for social media data (Twitter for the beginning) – pre-alpha version
https://pypi.org/project/dabapush/
MIT License
0 stars 0 forks source link

large jsonl/ndjson-files create OOM-error #13

Closed pekasen closed 2 years ago

pekasen commented 2 years ago

If lines is set to True for the Twacapic-Reader, the reader tries to read the file all at once, thus, creating an OOM-error when the file is sufficiently large.

pekasen commented 2 years ago

following snippet does the trick:

with Path("large_file_to_open.dat").open("r", encoding="utf8") as file:
  for line in file:  
    yield ujson.loads(line)

This reduces the memory consumption from as much as the file needs to just the memory footprint of each line.