Closed eevleevs closed 2 years ago
Honestly never considered using non-ascii characters, because I was operating under the false assumption that they were identifiers that stuck to the conventional rules of:
Identifiers can be a combination of lowercase letters (a to z) or uppercase letters (A to Z) or digits (0 to 9) or an underscore (_).
Always open to PRs to help improve it!
Looks like non-ascii identifiers were introduced on Python 3.0 with PEP 3131. They are actually quite useful when writing math. The PEP gives also some details about how to implement, will try to make a proposal based on that.
Finally got around to looking at this, and discovered the str.isidentifier function.
In the process of implementing that I decided to try and optimize the code a little there, and due to that functions importance was able to make load times of large sets twice as fast!
Box 6.0 I plan to make Cython optimized which speeds it up another 10x across the board, but these optimizations stack thankfully, so in the end will be 20x faster :tada:
Sounds great! 🎉 Looking forward for it! 😃
While testing this found out something "fun".
>>> a = Box()
>>> a.σeq = 1
>>> a.µeq = 2
>>> a
Box({'σeq': 1, 'μeq': 2})
>>> a == Box({'σeq': 1, 'µeq': 2})
False
Python uses NFKC normalization on variables and attributes behind the scenes. AKA
µ = 1
>>> print(dir()[-1])
μ
>>> ord(dir()[-1])
956
>>> ord('µ')
181
So that's something to be very wary of, implementation discussed https://www.python.org/dev/peps/pep-3131/#implementation
Adding these features in 6.0, currently have a release candidate that can be installed and tested with:
pip install python-box[all]==6.0.0rc2 --upgrade
6.0 is ushering in Cython speedups on supported platforms, so please let me know if you run into any issues!
All good so far, great job 👍
Box 6 has been released with this added, thanks for opening the issue!
Would you please explain the rationale behind allowing only
string.ascii_letters + string.digits + "_"
in keys? I would like to use greek letters and end up with silent errors like:Is there anything wrong with adding more utf-8 characters to
allowed
?