adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.98k stars 1.17k forks source link

str.format(): calculates string-length based on bytes instead on chars #8171

Open bablokb opened 1 year ago

bablokb commented 1 year ago

CircuitPython version

Adafruit CircuitPython 8.2.0 on 2023-07-05; Raspberry Pi Pico with rp2040

Code/REPL

>>> 
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== b_label = "Bat V"
=== b_value = "4.86"
=== t_label = "T/AHT °C"
=== t_value = "23.6"
=== "{label:<8.8}:{value:>4.4}".format(label=b_label,value=b_value)
=== "{label:<8.8}:{value:>4.4}".format(label=t_label,value=t_value) 
'Bat V   :4.86'
'T/AHT °:23.6'
>>>

Behavior

n.a.

Description

The format-specifier {label:<8.8} forces a field-length of 8 and truncates longer strings (and left justifies the string, but this is not of relevance here).

t_label in the example above has a string length of 8 but an UTF-8 byte-length of 9 (the degree-sign ° is a single char with two bytes). This results in t_label being truncated even though it would fit into the field.

Additional information

Standard python gets this right:

Python 3.9.15 (main, Oct 28 2022, 17:28:38) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b_label = "Bat V"
>>> b_value = "4.86"
>>> t_label = "T/AHT °C"
>>> t_value = "23.6"
>>> "{label:<8.8}:{value:>4.4}".format(label=b_label,value=b_value)
'Bat V   :4.86'
>>> "{label:<8.8}:{value:>4.4}".format(label=t_label,value=t_value)
'T/AHT °C:23.6'
>>>
tannewt commented 1 year ago

Would you mind testing the latest MicroPython? We inherit most of the relevant code for this issue from them and we're working to update CP to newer MP. Thanks!

bablokb commented 1 year ago

The current rp2 micropython gives me

=== b_label = "Bat V"
=== b_value = "4.86"
=== t_label = "T/AHT °C"
=== t_value = "23.6"
=== "{label:<8.8}:{value:>4.4}".format(label=b_label,value=b_value)
=== "{label:<8.8}:{value:>4.4}".format(label=t_label,value=t_value)
=== 
'Bat V   :4.86'
'T/AHT \xb0:23.6'
>>> 

b0 is the extended ascii value of °, so it seems this bug is also present in micropython.