Open Minitour opened 11 months ago
Update:
I hacked something real quick by modifying the common.py
:
class Index:
def __init__(self, initial_value=0):
self._value = initial_value
def increment(self):
self._value += 1
def decrement(self):
self._value -= 1
def __str__(self):
return f'{self._value}'
@utils.coroutine
def parse_basecoro(target):
path = []
while True:
event, value = yield
if event == 'map_key':
prefix = '.'.join(map(str, path[:-1]))
path[-1] = value
elif event == 'start_map':
if path and (indx := path[-1]) and type(indx) == Index:
indx.increment()
prefix = '.'.join(map(str, path))
path.append(None)
elif event == 'end_map':
path.pop()
prefix = '.'.join(map(str, path))
elif event == 'start_array':
prefix = '.'.join(map(str, path))
path.append(Index(0))
elif event == 'end_array':
path.pop()
prefix = '.'.join(map(str, path))
else: # any scalar value
prefix = '.'.join(map(str, path))
target.send((prefix, event, value))
Although it is not the best solution, it certainly achieves what I am looking for. Please consider adding something similar, but in the meantime, I will be using patch
to monkey-patch the library.
Hi @Minitour, thanks for taking an interest in improving ijson!
I think the idea is good in principle, but the suggested implementation is not going to fly. In particular:
items
and kvitems
calls, and that's an absolute no.parser
calls, and that's also an absolute no.common.py
applies the changes to all backends except yajl2_c
, which is the default one (because it's 10x faster than the next one in the list).If I implemented this, I'd do it at the items
/kvitems
level, where you could interpret the [n]
s in the given prefix and match them to the n
th appearance of item
in the underlying path. Also, maybe instead of a.b.[0].c
one could simply have a.b.0.c
? The brackets seem unnecessary.
In any case, I'm in no hurry to implement this. Maybe if more people somehow upvote this I could give it some attention. It would also be an incentive if someone (you?) presented a modified version of items_basecoro
that understood these numeric indices as indicated above, hopefully with tests -- then we could iterate into a final solution that covered all backends.
There's an RFC for specifying a path through a JSON object: RFC 6901 JSON Pointer https://www.rfc-editor.org/rfc/rfc6901
The JSON Pointer syntax for the example A.[0].B.[0].C
above is /A/0/B/0/C
.
It would be great to have support for that, but that would probably have to be via new functions.
I myself would prefer a different form of path info Simply a list of the keys that I need to get from the root to the node in question, ie.
["A", 0, "B", 0, "C"]
Is your feature request related to a problem? Please describe. I am streaming values from a large JSON file into a dataframe, but I am unable to group relevant items together due to lack of depth.
Describe the solution you'd like For example, instead of
A.item.B.item.C
which can be repeated many times.It would be great to have something like:
A.[0].B.[0].C
For example for the following object:
I would expect to see the following events:
A.[0].B.[0].C
string
Test-1
A.[0].B.[1].C
string
Test-2
A.[0].B.[2].C
string
Test-3
A.[1].B.[0].C
string
Test-4
A.[1].B.[1].C
string
Test-5
A.[1].B.[2].C
string
Test-6
Describe alternatives you've considered N/A