Erotemic / ubelt

A Python utility library with a stdlib like feel and extra batteries. Paths, Progress, Dicts, Downloads, Caching, Hashing: ubelt makes it easy!
Apache License 2.0
723 stars 43 forks source link

Dict regex #60

Closed endremborza closed 5 years ago

endremborza commented 5 years ago

Hello!

I quite like this package and your code. I also like regex, and I needed this, so I thought I might as well do a PR.

I might do some other regex stuff later, so if you like it, I can do PRs from time to time.

Erotemic commented 5 years ago

Hi, thanks for the interest in this package!

While I'm not opposed to adding new utilities, I'm trying to be very careful about preventing this from becoming a kitchen sink library (like its predecessor). As such I'm setting the bar for being included to be quite high.

This behavior might be more naturally expressed as

{k: v for k, v in dict_.items() if re.match(regexpr, k)}

But, I can see that this might be clunky. Its a pain to have to reference the key by name three times. Its the same reason why I like dict_subset(dict_, keys) / dict_isect(dict_, keys) over {k: dict_[k] for k in keys}: you only need to refer to the variables once to accomplish the task.

You could do a little better with something like

dict_subset(dict_, filter(re.compile(regexpr).match, dict_))

but again you would have to reference the dict twice, so that's not ideal either.

Even so, I'm not a huge fan of adding functions that are specific to regular expressions; I feel like that is a bit too narrow. I've tried to do stuff like that in the past, but it just turned out to be a mess. So ultimately I don't want to include anything targeted specifically towards regex.


However, there may be a middle ground. What you are essentially trying to do is filter a dictionary; you just happen to be using regular expressions to do that.

The existing map_keys and map_values are based of the builtin python map function. In my experience I've found them to highly reusable, but ubelt does not contain the analogous functions for filter or reduce.

I think I'd be much more likely to accept an implementation of filter_keys and filter_vals, and I think you could use them to address your regex use-case with minimal extra code.

filter_keys(re.compile(regexpr).match, dict_)

While this is 16 characters longer than you would get with your proposed implementation, I think its more in-line with the goals of this package.

endremborza commented 5 years ago

Well, it depends on whether you accept regex as a fundamental part of your toolbox.

i like filter_keys or filter_values but I think implementations of the reverse should be done with them as well, like you can do now for subset with diff. I'm not sure about naming it though, as I don't like drop that much.

One last suggestion, for coupling regex with ubelt, and shaving off a little of that 16 characters (which are over one third of the expression...) is that doing a type check, so that

dict_subset(_dict, re.compile(regexpr))

to work as the currently suggested dict_re_match does.

Erotemic commented 5 years ago

Closing due to lack of activity. If you want to add the filter_keys functionality please feel free to start a new pull request.