Closed adam-mrozik closed 2 years ago
I am not sure what would be performance of planet = ijson.items(f, 'planet') line if planet key was at the end of file. Would it traverse one more time?
Indeed. You'd need to fetch the content twice, or at least multiplex it into the two different ijson.items
calls, as each invocation to ijson.items
fully consumes the given stream of data. Because of this you'll probably want to go with Solution 2.
[Solution 2] However, what if top planet field is not at the beginning, but at the end?
Indeed that would be the worst case scenario. If you want to have this information appear on each of the flat records you are storing in your database then you'd have to accumulate all the records in memory and wait for planet
to appear, so you can update the records and put them in the database. An alternative approach, if your database schema allows it, would be to write the city records without a planet, and issue an UPDATE to set their planet when you finally find it.
As you noticed this is not a problem of ijson
itself, but the fact that your document wouldn't be well suited for streaming. Still, if you can write planet-less records and then issue an update you should be fine to go.
Closing since this was answered a long time ago.
Hey,
I have a problem which I am trying to solve with
ijson
. Let's say I have this kind of file:So, multiple countries and each has multiple cities.
And, I want to submit them to database as flat rows, e.g.:
I see few approaches, but each has its issues (mostly due to my unconventional data structure):
Solution 1:
This solution only helps partially, because while each country is serialized separately, they can still have a lot of cities, making streaming not that useful. Also, I am not sure what would be performance of
planet = ijson.items(f, 'planet')
line ifplanet
key was at the end of file. Would it traverse one more time?Solution 2: Parsing
In this solution, ijson traverses through the file and I can easily pinpoint place at which new city row should be sent. However, what if top planet field is not at the beginning, but at the end? In this case this solution does not work, because parser will not see it until the end of the file. Ideally in this example, keys would be ordered so that both
countries
andcities
are at the bottom of their respective levels, but I do not see a way to do it unknowingly