esp-rs / espflash

Serial flasher utility for Espressif SoCs and modules based on esptool.py
Apache License 2.0
478 stars 117 forks source link

Weird println! issue #457

Closed manio closed 1 year ago

manio commented 1 year ago

Hi, During my work on the project i accidentally added polish national UTF-8 character to the debug message. After some time I saw that once upon a time the text line is "mangled".

I was able to reproduce the problem like this: cargo generate https://github.com/esp-rs/esp-template Then I added the following code as the main loop:

    let mut delay = Delay::new(&clocks);
    loop {
        println!("Hello world! with UTF: wysyłam");
        delay.delay_ms(500u32);
    }

The above national character is: <ł> 322, Hex 0142, Oct 502, Digr l/

Board: ESP32

And the result is: image

Do you have any idea what is going on? :)

bjoernQ commented 1 year ago

Indeed, weird and even weirder I can reproduce this!

The really weird thing here is that esp-println just calls the ROM function (on ESP32).

I can imagine that the problem isn't esp-println but espflash - e.g. hterm on Windows seems to receive everything alright (besides it apparently can only display ASCII)

image

So maybe you could try another serial monitor and let us know if it changes anything

manio commented 1 year ago

This is the change I've made:

diff --git a/espflash/src/cli/monitor/mod.rs b/espflash/src/cli/monitor/mod.rs
index 28f2905..587da65 100644
--- a/espflash/src/cli/monitor/mod.rs
+++ b/espflash/src/cli/monitor/mod.rs
@@ -115,6 +115,7 @@ pub fn monitor(
         }?;

         if read_count > 0 {
+            println!("\nDEBUG: {:02X?}", &buff[0..read_count]);
             handle_serial(&mut ctx, &buff[0..read_count], &mut stdout);
         }

And this is the result (I've formatted it because there was a plenty of spaces in-between):

DEBUG: [48]
H
DEBUG: [65, 6C, 6C, 6F]
ello
DEBUG: [20, 77, 6F, 72, 6C, 64, 21, 20, 77, 69, 74, 68]
world! with
DEBUG: [20, 55, 54, 46, 3A, 20, 77, 79, 73, 79, C5]
UTF: wysy�
DEBUG: [82, 61, 6D, 0A]
�am

DEBUG: [48]
H
DEBUG: [65, 6C, 6C, 6F, 20, 77, 6F, 72, 6C, 64]
ello world
DEBUG: [21, 20, 77, 69, 74, 68, 20, 55, 54, 46, 3A]
! with UTF:
DEBUG: [20, 77, 79, 73, 79, C5, 82, 61, 6D, 0A]
wysyłam

DEBUG: [48]
H
DEBUG: [65, 6C, 6C]
ell
DEBUG: [6F, 20, 77, 6F, 72, 6C, 64, 21, 20, 77, 69, 74]
o world! wit
DEBUG: [68, 20, 55, 54, 46, 3A, 20, 77, 79, 73, 79, C5]
h UTF: wysy�
DEBUG: [82, 61, 6D, 0A]
�am

DEBUG: [48]
H
DEBUG: [65, 6C, 6C, 6F, 20, 77, 6F, 72, 6C]
ello worl
DEBUG: [64, 21, 20, 77, 69, 74, 68, 20, 55, 54, 46, 3A]
d! with UTF:
DEBUG: [20, 77, 79, 73, 79, C5, 82, 61, 6D, 0A]
wysyłam

DEBUG: [48]
H
DEBUG: [65, 6C, 6C]
ell
DEBUG: [6F, 20, 77, 6F, 72, 6C, 64, 21, 20, 77, 69, 74]
o world! wit
DEBUG: [68, 20, 55, 54, 46, 3A, 20, 77, 79, 73, 79]
h UTF: wysy
DEBUG: [C5, 82, 61, 6D, 0A]
łam

As you can see - the bytes itself are identical but you surely also see the problem... The handle_serial is calling String::from_utf8_lossy, but the problem is the buffering - sometimes it got the buffer "cut" in the wrong place - in the middle of the two-byte character (79 C5 in this case).

Cc: @jessebraham, @SergioGasquez

MabezDev commented 1 year ago

This is because serial ports are not utf8 aware and just transfer a byte at a time. To solve this we should handle the Utf8Error instead of using from_utf8_lossy. If we handle the error we use valid_up_to to find where to "cut" the buffer.

bugadani commented 1 year ago

I'll try and play with this today. The solution is probably a bit more nuanced than the suggestion above, as it needs to not die on mid-buffer errors as well.

manio commented 1 year ago

@bugadani Thanks, tested your change, and so far looking great (no more problems with UTF-8)