cdgriffith / Box

Python dictionaries with advanced dot notation access
https://github.com/cdgriffith/Box/wiki
MIT License
2.61k stars 106 forks source link

flatten dot keys #152

Closed ipcoder closed 4 years ago

ipcoder commented 4 years ago

I have implemented support for dot keys by sub-classing Box and using it for two years by now. Among other things it provides an alternative data model which essentially flattens the hierarchy - a useful approach in certain cases. To fully support this model I needed to allow also flat iteration over the data elements. It was implemented by adding optional parameter in keys and items and values members.

For example, for keys:

def keys(self, depth=False):
    ...

So that

box.keys(depth=True)
# ['a.b.c', 'a.b.d', 'a.x.y']

That makes the 'flat' model complete and can be implemented with minimal code and performance overhead.

Do you think it could be generic enough to include into the package?

cdgriffith commented 4 years ago

Hi ipcoder,

The only reason I want to shy away from doing that is in Box 5 I was planning on removing the overrides from keys, values and items completely, because I do not know any way to provide proper dict_keys() and dict_item() equivalents and there wasn't a need to override them at this time.

At the same time I would like having that option of providing depth for the keys. Is it possible to add those via a separate method, say such as dot_keys or something and still have your method work?

ipcoder commented 4 years ago

Hi I see no reason to prefer keys(depth=True) to dot_keys() (or perhaps flat_keys()). Looking forward to use it ;-)

cdgriffith commented 4 years ago

Hey @ipcoder, thinking about it some more, I do think keeping it in keys is the right thing to do, as it can return a dict_view under normal circumstances and will return a list only if additional options are specified.

5.0.0a1 has the current changes that add dotted and flat options to keys.

In this version, dotted must be set to have keys returned in the new format. If set to True or -1, all children will be printed.

mov = Box(movie_data, box_dots=True)
pprint(mov.keys(dotted=True))

['movies',
 'movies.Robin Hood: Men in Tights',
 'movies.Robin Hood: Men in Tights.Director',
 'movies.Robin Hood: Men in Tights.Stars',
 'movies.Robin Hood: Men in Tights.Stars[0]',
 'movies.Robin Hood: Men in Tights.Stars[0].imdb',
 'movies.Robin Hood: Men in Tights.Stars[0].name',
 'movies.Robin Hood: Men in Tights.Stars[0].role',
 'movies.Robin Hood: Men in Tights.Stars[1]',
 'movies.Robin Hood: Men in Tights.Stars[1].imdb',
 'movies.Robin Hood: Men in Tights.Stars[1].name',
 'movies.Robin Hood: Men in Tights.Stars[1].role',
 'movies.Robin Hood: Men in Tights.Stars[2]',
 'movies.Robin Hood: Men in Tights.Stars[2].imdb',
 'movies.Robin Hood: Men in Tights.Stars[2].name',
 'movies.Robin Hood: Men in Tights.Stars[2].role',
 'movies.Robin Hood: Men in Tights.Stars[3]',
 'movies.Robin Hood: Men in Tights.Stars[3].imdb',
 'movies.Robin Hood: Men in Tights.Stars[3].name',
 'movies.Robin Hood: Men in Tights.Stars[3].role',
 'movies.Robin Hood: Men in Tights.imdb_stars',
 'movies.Robin Hood: Men in Tights.length',
 'movies.Robin Hood: Men in Tights.rating',
 'movies.Spaceballs',
 'movies.Spaceballs.Director',
 'movies.Spaceballs.Stars',
 'movies.Spaceballs.Stars[0]',
 'movies.Spaceballs.Stars[0].imdb',
 'movies.Spaceballs.Stars[0].name',
 'movies.Spaceballs.Stars[0].role',
 'movies.Spaceballs.Stars[1]',
 'movies.Spaceballs.Stars[1].imdb',
 'movies.Spaceballs.Stars[1].name',
 'movies.Spaceballs.Stars[1].role',
 'movies.Spaceballs.Stars[2]',
 'movies.Spaceballs.Stars[2].imdb',
 'movies.Spaceballs.Stars[2].name',
 'movies.Spaceballs.Stars[2].role',
 'movies.Spaceballs.imdb_stars',
 'movies.Spaceballs.length',
 'movies.Spaceballs.rating']

However to make it flat aka only display items with deepest values:

mov = Box(movie_data, box_dots=True)
pprint(mov.keys(dotted=True, flat=True))

['movies.Robin Hood: Men in Tights.Director',
 'movies.Robin Hood: Men in Tights.Stars[0].imdb',
 'movies.Robin Hood: Men in Tights.Stars[0].name',
 'movies.Robin Hood: Men in Tights.Stars[0].role',
 'movies.Robin Hood: Men in Tights.Stars[1].imdb',
 'movies.Robin Hood: Men in Tights.Stars[1].name',
 'movies.Robin Hood: Men in Tights.Stars[1].role',
 'movies.Robin Hood: Men in Tights.Stars[2].imdb',
 'movies.Robin Hood: Men in Tights.Stars[2].name',
 'movies.Robin Hood: Men in Tights.Stars[2].role',
 'movies.Robin Hood: Men in Tights.Stars[3].imdb',
 'movies.Robin Hood: Men in Tights.Stars[3].name',
 'movies.Robin Hood: Men in Tights.Stars[3].role',
 'movies.Robin Hood: Men in Tights.imdb_stars',
 'movies.Robin Hood: Men in Tights.length',
 'movies.Robin Hood: Men in Tights.rating',
 'movies.Spaceballs.Director',
 'movies.Spaceballs.Stars[0].imdb',
 'movies.Spaceballs.Stars[0].name',
 'movies.Spaceballs.Stars[0].role',
 'movies.Spaceballs.Stars[1].imdb',
 'movies.Spaceballs.Stars[1].name',
 'movies.Spaceballs.Stars[1].role',
 'movies.Spaceballs.Stars[2].imdb',
 'movies.Spaceballs.Stars[2].name',
 'movies.Spaceballs.Stars[2].role',
 'movies.Spaceballs.imdb_stars',
 'movies.Spaceballs.length',
 'movies.Spaceballs.rating']

This change also allows for a particular depth to be specified via dotted:

mov = Box(movie_data, box_dots=True)
pprint(mov.keys(dotted=1, flat=True))

['movies.Robin Hood: Men in Tights', 'movies.Spaceballs']

Is this close to how you envisioned it to work / would fit your use case?

Thanks again for the great suggestion!

ipcoder commented 4 years ago

Hi @cdgriffith I probably understand your perspective: for you the main differentiator is the question of keys format: to use dotted format or the old one, and then, as an additional control over how dotted are produced, you have flat to filter internal nodes. This logic leaves the case of dotted=False, flat=True ill-defined, right? Since flat implies dotted.

My be we can slightly redefine accents from the use cases point of view:

  1. There is a use case of flat keys to all the actual data elements - in this case additional dotted argument has no sense, as there is no other way you can have flat, right?
  2. Another use case is (probably, since you are making efforts to support it) to have the tree nodes (or depth-limited cut of it) represented in dotted form. In this case I can hardy see how usage of flat can be relevant.

So, if I am not missing something, those two arguments are not really independent ones, they look more like different modes of same 'key-flatteningordottingmechanism, and if so, can be implemented as different values of asingle` argument.

For example:

dotted=0 or False or None  # regular keys
dotted=True  # what we have called flat 
dotted=2       # a number - depth
dotted=-1      # a negative number - depth= infinity - all the tree
dotted='all'   # a possible alternative for -1

Obviously the name and the conventions can be improved, but i hope you understand my point.

cdgriffith commented 4 years ago

I can respect those use cases. I will go do some personal testing and think on it :)

My gut reaction is that I don't like changing behavior depending on such small changes of a single argument, and that it might be better to just always return a "flat" structure. Because even if cut off at a certain depth, best to just stick to end nodes (or depth reached) from a practical perspective.

cdgriffith commented 4 years ago

After thinking about this some more, I think having more options just makes it more confusing, and your first idea keeps it simple and easiest to use.

In 5.0.0a3 I have it as keys(dotted=True) should perform the behavior you initially suggested.

To test out:

pip install --upgrade python-box[all]==5.0.0a3
ipcoder commented 4 years ago

Hi, when testing 5.0.1 I have noticed that box.items(dotted=True) is not supported. Don't you think items should follow keys in this respect?

cdgriffith commented 4 years ago

That is a very good point, and shouldn't be hard to implement. repoening to capture that.