Closed Rstar1998 closed 1 year ago
@Rstar1998 please follow the advice given in the template: share what you've tried, ask more precise questions, hopefully also some example data, etc. With such a broad description there's little help you can get.
@rtobar I have updated my description. Let me know if any more info is needed.
Thanks @Rstar1998, that's much clearer now :-)
The problem is that you are creating a single list with all the results, then feeding it to MongoDB. That is what's causing the problem, not the ijson iteration itself. What you need is to indeed chunk the results from the ijson iteration and feed those chunks to MongoDB.
To answer you direct question: no, ijson doesn't offer chunking itself. The good news is that we don't really need to, as this is a simple and common task. You could for example use itertools.islice for that, which doesn't require much work. Something like (taken from https://docs.python.org/3/library/itertools.html#itertools-recipes, see the one for "batched"):
items = ijson.items(f, "item")
while (batch := tuple(islice(items, n))):
# insert batch into MongoDB
@rtobar . Thank you very much .
I need to read a huge json file and insert in mongo db. I want to read the json records in chunks of 1 million or any number . How do I achieve such thing using ijson ?
So I have 2GB Json file which I need to load it to mongo database using python. I used the following piece of code
The problem is that , this process takes a huge amount of time and memory since a 2GB file is read in a list and given to insert_many to load in mongodb. Is it possible to load the file in chunks of 10000 rows and insert ? Like
feel free to correct me if I am following wrong approach or is there any other solution by which I can solve my issue ?
data sample