Open xhh2a opened 9 years ago
what dpath version are you using?
Nevermind, I can replicate this behavior on latest master.
This has been broken for a long time, ever since values() was introduced, possibly even before. Really sorry about this one. I may not get around to a fix for this one tonight, but it's high on my list.
From a quick glance at the logic in the code yesterday, I think it is due to the way the code does in place filtering while it does the search so it fails on the top level branch of the dict before it hits a result where the filter would pass it. I was expecting a post dpath filter, not necessarily a bug. Although it didn't print anything so it could be something else.
Also I have a bug in the lambda should be .get() == 'correct', will test once I get into the office
So the typo doesn't change anything
TL;DR - Filtering doesn't work the way any of us assumes it does!
So, after sleeping on this and pulling my head out of my buttcheeks, this boils down to a lack of good documentation. In fact, the lack of good docs is so bad that even I misunderstood what's actually going on behind the scenes, and how these functions work.
First, you're correct, filters are pre, not post. They are executed as each level of the source dictionary is parsed. Second, dpath treats dictionaries and lists as trees. Filters execute on the leaf nodes, e.g., the individual dictionary values; they do not execute on dictionaries or lists, which are branches on the tree, not leaf nodes. Consider:
>>> import dpath.util
>>> a = { 'actions' : [ { 'type': 'correct' }, { 'type': 'incorrect' } ] }
>>> def f(x):
... print x
...
>>> dpath.util.values(a, 'actions/*',afilter=f)
[]
>>> dpath.util.values(a, '**',afilter=f)
correct
incorrect
[]
The first call to values failed (and printed nothing) because actions/* matches only one level deep (not multi-level), and the only thing there is 2 dictionary objects, neither of which are leaf nodes, so the filter isn't executed on them, hence nothing gets printed (and not returned).
This behavior is not exclusive to values() either, the filtering happens in search, so search has the same issue:
>>> dpath.util.search(a, 'actions/*', afilter=f)
{}
.. If we expand the search range to "actions/**", we will reach the leaf nodes we want to filter against, and our print works:
>>> dpath.util.values(a, '**',afilter=f)
correct
incorrect
[]
... if we change our filter to actually give us some results now (with a .get() lambda like you had), we see now that this is obviously going to fail, because we're calling .get() on a string:
>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x.get('type', None) == 'correct'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "dpath/util.py", line 117, in values
return [x[1] for x in dpath.util.search(obj, glob, yielded=True, separator=separator, afilter=afilter, dirs=dirs)]
File "dpath/util.py", line 144, in _search_yielded
val = dpath.path.get(obj, path, view=False, afilter=afilter)
File "dpath/path.py", line 276, in get
if (afilter and (not afilter(target))):
File "<stdin>", line 1, in <lambda>
AttributeError: 'str' object has no attribute 'get'
... but it works great if we check against a simple string comparison, because we're getting the leaf nodes:
>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x == 'correct'))
['correct']
>>> dpath.util.search(a, 'actions/**', afilter=(lambda x: x == 'correct'))
{'actions': [{'type': 'correct'}]}
So, while filtering does work exactly as it was designed, this implementation clearly defies the rule of least astonishment. Even though the docs clearly state how it works:
... Filtering functions receive every terminus node in a search - e.g., anything that is not a dict or a list, at the very end of the path. For each value, they return True to include that value in the result set, or False to exclude it.
... I think it's worth considering changing how this works, or supporting some kind of alternate filtering mechanisms.
RFC, @calebcase, OP?
In terms of alternative post-search, I am able to handle it using other methods. Thus it's not that big of a deal if a pre filter is kept. Is it less efficient than doing it while stepping through the tree? Yes, but it's not something that should result in significant issues for me in my use case. Honestly I can't think of too many use cases where I would want a pre search, but I can see where others may want it.
I suppose the documentation could definitely use some clarification on this. Changing the filter type might cause backwards incompatibility for users though so a post-search might be better to be added as a separate arg param.
Incidentally I need to remember to test unicode/ascii with filter.
What if we could say that all filter arguments must be subclasses of some base filter object, say dpath.Filter, and each Filter can implement one or more filtering steps, which would be called at various steps of the process? Then the user could just implement/override the steps they care about, and you could also make multi-step filters this way:
class MyFilter(dpath.Filter):
def pre_leaf(self, leaf):
"""pre-filter a leaf node"""
pass
def post_leaf(self, leaf):
"""post-filter a leaf node"""
pass
def pre_branch(self, branch)
"""pre-filter a branch node (list or dictionary)"""
pass
def post_branch(self, branch):
"""post-filter a branch node (list or dictionary)"""
pass
This would raise the question re: what to do with the existing filter functionality (being able to pass a lambda is awfully convenient), but I think this implementation would be significantly more clear.
This seems like an excellent thing to fix in 2.0, re-labeling.
Blocked by #136
I guess I will look into this sometime