Allowed characters in keys

cdgriffith / Box

Python dictionaries with advanced dot notation access

https://github.com/cdgriffith/Box/wiki

MIT License

2.61k stars 106 forks source link

Allowed characters in keys #183

Closed eevleevs closed 2 years ago

eevleevs commented 3 years ago

Would you please explain the rationale behind allowing only string.ascii_letters + string.digits + "_" in keys? I would like to use greek letters and end up with silent errors like:

>>> a = Box()
>>> a.σeq = 1
>>> a.µeq = 2
>>> a
<Box: {'σeq': 2}>

Is there anything wrong with adding more utf-8 characters to allowed?

cdgriffith commented 3 years ago

Honestly never considered using non-ascii characters, because I was operating under the false assumption that they were identifiers that stuck to the conventional rules of:

Identifiers can be a combination of lowercase letters (a to z) or uppercase letters (A to Z) or digits (0 to 9) or an underscore (_).

Always open to PRs to help improve it!

eevleevs commented 3 years ago

Looks like non-ascii identifiers were introduced on Python 3.0 with PEP 3131. They are actually quite useful when writing math. The PEP gives also some details about how to implement, will try to make a proposal based on that.

cdgriffith commented 2 years ago

Finally got around to looking at this, and discovered the str.isidentifier function.

In the process of implementing that I decided to try and optimize the code a little there, and due to that functions importance was able to make load times of large sets twice as fast!

https://github.com/cdgriffith/Box/commit/b3ba1fb7eccd746dd6d56d03f9d6fc036608451b#diff-ded35bb17832f133568fbb4f1627f47f7bf0357bd7770e7576a65dd77542c540R735

Box 6.0 I plan to make Cython optimized which speeds it up another 10x across the board, but these optimizations stack thankfully, so in the end will be 20x faster :tada:

eevleevs commented 2 years ago

Sounds great! 🎉 Looking forward for it! 😃

cdgriffith commented 2 years ago

While testing this found out something "fun".

>>> a = Box()
>>> a.σeq = 1
>>> a.µeq = 2
>>> a
Box({'σeq': 1, 'μeq': 2})
>>> a == Box({'σeq': 1, 'µeq': 2})
False

Python uses NFKC normalization on variables and attributes behind the scenes. AKA

µ = 1
>>> print(dir()[-1])
μ
>>> ord(dir()[-1])
956
>>> ord('µ')
181

So that's something to be very wary of, implementation discussed https://www.python.org/dev/peps/pep-3131/#implementation

cdgriffith commented 2 years ago

Adding these features in 6.0, currently have a release candidate that can be installed and tested with:

pip install python-box[all]==6.0.0rc2 --upgrade

6.0 is ushering in Cython speedups on supported platforms, so please let me know if you run into any issues!

eevleevs commented 2 years ago

All good so far, great job 👍

cdgriffith commented 2 years ago

Box 6 has been released with this added, thanks for opening the issue!