Jaymon / datatypes

My personal standard library
MIT License
2 stars 1 forks source link

Character class narrow unicode #10

Open Jaymon opened 4 years ago

Jaymon commented 4 years ago

I was seeing some interesting behavior when python2 had only unicode ucs2 support:

$ python
Python 2.7.18 (default, Sep  1 2020, 16:08:16)
>>> s = u'\uD859\uDFCC'
>>> s
u'\U000267cc'
u'\uD859\uDFCC'.encode("UTF-32").decode("UTF-32")
u'\U000267cc'

It was taking the utf-16 hex codes (\uD859 and \uDFCC) and converting them to the utf-32 hex code (\U000267cc) behind the scenes. I have methods like repr_string and repr_bytes and I might want to add some utf-8 (bytes), utf-16 (the \u values) and utf-32 (the \U values) methods just so you can get more information about the character. To see how all these come together, you can use fileformat.info and these are some pages I had open:

search:

Jaymon commented 3 years ago

I had this open in a text file:

u'\uD859\uDFCC'.encode("UTF-32").decode("UTF-32") = u'\U000267cc'

u'\U000267cc'.encode("UTF-32").decode("UTF-32") = u'\U000267cc'