adriank / ObjectPath

The agile query language for semi-structured data
http://objectpath.org
MIT License
380 stars 93 forks source link

Add support for custom dictionaries (including OrderedDict) #25

Closed piokuc closed 6 years ago

piokuc commented 10 years ago

By default Python's json.load and json.loads creates standard Python dictionaries, which don't preserve order of keys, so for example the following snippet:

import json, collections
import objectpath

data = """ 
{   
    "g1" : { "a" : 1 },
    "g2" : { "a" : 2 },
    "g3" : { "a" : 3 } 
}   
""" 

tree = objectpath.Tree(json.loads(data))
print "Values of a: ", ",".join(str(x) for x in tree.execute("$..a"))

will print:

Values of a:  3,2,1

This is a problem for me, because for my application that order is important, I would like to see "1,2,3" printed, as this is the order of values of "a" in my JSON.

So I tried loading data with flag object_pairs_hook = collections.OrderedDict, which causes json.loads to produce order preserving OrderedDicts:

tree = objectpath.Tree(json.loads(data, object_pairs_hook = collections.OrderedDict))

Unfortunately, if I change the example program like this I get:

Values of a:  
Traceback (most recent call last):
  File "object_path_bug.py", line 16, in <module>
    print "Values of a: ", ",".join(str(x) for x in tree.execute("$..a"))
  File "/var/tmp/SKL/lib/python2.7/site-packages/objectpath/core/interpreter.py", line 608, in execute
    ret=exe(tree)
  File "/var/tmp/SKL/lib/python2.7/site-packages/objectpath/core/interpreter.py", line 288, in exe 
    fst=flatten(exe(node[1]))
  File "/var/tmp/SKL/lib/python2.7/site-packages/objectpath/core/interpreter.py", line 246, in exe 
    return self.data
AttributeError: 'Tree' object has no attribute 'data'

Would it be possible to fix objetpath so it can work with json data decoded with object_pairs_hook = collections.OrderedDict, please?

adriank commented 9 years ago

Doesn't dict in Python 2.7 preserve order? It's an issue in Python 3, but all ordering tests are passing in Python 2.

Anyway I will add support for other dict types in nearest occasion. If you want a quick sollution, change all occurrences of type(<smth>) is dict to type(<smth>) in (dict,collections.OrderedDict) in the interpreter.py file. Should work.

piokuc commented 9 years ago

Thanks for this. I have replaced all 'is dict' and 'is not dict' with 'in (dict,collections.OrderedDict)' and 'not in (dict,collections.OrderedDict)' ,interpreter.py as you suggested, also patched similar expressions like, for example:

         def setData(self,obj):
 -               if type(obj) in ITER_TYPES+[dict]:
+               if type(obj) in ITER_TYPES+[dict,collections.OrderedDict]:
                        self.data=obj

Now I don't get the exception as previously, but results of queries are wrong. For example this program:

import json, collections
import objectpath

data = """
{
    "g1" : { "a" : 1 },
    "g2" : { "a" : 2 },
    "g3" : { "a" : 3 }
}
"""

tree = objectpath.Tree(json.loads(data, object_pairs_hook = collections.OrderedDict))
print "Values of a: ", ",".join(str(x) for x in tree.execute("$..a"))

will print

Values of a:  

which means that tree.execute("$..a") returned an empty result set, whereas I would expect it to return something like [1,2,3].

BTW it would be nice if any user defined object conforming to Python's dictionary protocol was supported, so I think a better way of testing if an object is a dictionary-like should be used for a final fix.

adriank commented 9 years ago

Can you drop here the results of running interpreter in debug mode?

tree = objectpath.Tree(json.loads(data, object_pairs_hook = collections.OrderedDict),{"debug":True})
tree.execute("$..a")
piokuc commented 9 years ago

Here it is:

INFO@35 All strings will be cut to 100 chatacters. START@44 Tree.execute PARSE STAGE ('..', ('(root)', 'rs'), ('name', 'a')) START@57 executing node '('..', ('(root)', 'rs'), ('name', 'a'))' START@57 executing node '('(root)', 'rs')' START@57 executing node '('name', 'a')' DEBUG@295 .. finding all a in <generator object flatten at 0xa91910> DEBUG@303 .. returning <itertools.chain object at 0xaa7d50> END@610 Tree.execute with: '<itertools.chain object at 0xaa7d50>'

adriank commented 9 years ago

You probably need to add OrderedDict also here: https://github.com/adriank/ObjectPath/blob/master/objectpath/utils/__init__.py#L45

Greetings, Adrian Kalbarczyk

http://about.me/akalbarczyk

2014-11-17 19:27 GMT+01:00 piokuc notifications@github.com:

Here it is:

INFO@35 All strings will be cut to 100 chatacters. START@44 Tree.execute PARSE STAGE ('..', ('(root)', 'rs'), ('name', 'a')) START@57 executing node '('..', ('(root)', 'rs'), ('name', 'a'))' START@57 executing node '('(root)', 'rs')' START@57 executing node '('name', 'a')' DEBUG@295 .. finding all a in DEBUG@303 .. returning END@610 Tree.execute with: ''

— Reply to this email directly or view it on GitHub https://github.com/adriank/ObjectPath/issues/25#issuecomment-63350601.

piokuc commented 9 years ago

Sorry about the delayed response, I was busy with other things...

I can confirm that after patching utils/init.py all works as expected - thank you!

-               elif typefrg is dict:
+               elif typefrg in (dict, collections.OrderedDict):

Just want to say that I think proper fix should use something like

hasattr(d, '__getitem__')

instead of

d in (dict,collections.OrderedDict)

It would be great if objectpath was able to work with any user defined dictionary. It is advertised as an "equivalent of XPath for JSON", but I think it can be more general, the data it can work with doesn't have to be a decoded JSON. For example, I am currently dumping some binary files in an obscure format to JSON, then parse it with the standard Python json parser and then pass it to objectpath. These binary files can be huge. I am planning to add Python bindings to the decoder of the binary format (a library written in C) and expose file's content as a Python object conforming to the Python dictionary protocol. It would be great if objectpath could work with such objects as well as with dict or OrderedDict.

adriank commented 9 years ago

hasattr(d, 'getitem') I am thinking of implementing duck typing across OP. We'll see if it is going to work.

I can include tests based on your object conforming to the Python dictionary protocol in the test suite for v0.6 if you like.

piokuc commented 9 years ago

By duck typing style of testing of objects' types I understand an approach like: try to use the object as a dictionary, it will throw an exception if it's not, you can then test for other types in an exception handler. This is widely considered the preferred way of doing things in Python. However, I am not entirely sure this is actually simpler and safer. Consider this:

>>> class X: pass
...
>>> x = X()
>>> x['a']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: X instance has no attribute '__getitem__'
>>> x = []
>>> x['a']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>>

So, basically, you can see that if you try to treat an object as a dictionary it can throw at least two types of exceptions - and who knows, maybe more. But it will always have the __getitem__ method. Unfortunately, lists have that method, too... BTW strings in Python behave like lists when you try to index them, but normally you'd like to distinguish them from lists. So it seems it's not that straightforward to do in Python, but, of course, there must be a way to do it in a clean and 'Pythonic' way.

As far as the unit tests are concerned, I think it should be good enough to define a Python class with methods like __getitem__ (and possibly others, whatever is needed to mimic a dictionary, I'm not sure now what that should be) and see if objectpath can work with objects of that class.

Anyway, it'll be great if a next version of objectpath can work with OrderedDict and other user defined dictionary objects. Thanks for the great product!

adriank commented 9 years ago

Yeah, that's why I'm using type(obj) all over the place.