The standard JSON package reads the entire file to validate and parse the structure, and can take a lot of memory.
Research alternatives, including streaming JSON parsers or even custom parsing. The JSON we're reading is just a list of objects; we want to parse and validate each list item, but we could be less discriminating about the outer list if there's a more efficient alternative.
There is a nice python package called ijson which is built on the popular YAJL json iterative parser library, but using it means we would rely on 3rd party library or otherwise we might have to write our own wrapper similar to ijson (might require significant efforts in it). Any ideas about implementing a wrapper are welcome
Does it make sense if we could use sqllite like database in future if json files starts to grow quite big? so that way we dont have to load everything in memory and can store everything on disk
The standard JSON package reads the entire file to validate and parse the structure, and can take a lot of memory.
Research alternatives, including streaming JSON parsers or even custom parsing. The JSON we're reading is just a list of objects; we want to parse and validate each list item, but we could be less discriminating about the outer list if there's a more efficient alternative.