berkowski / tokio-serial

A serial port implementation for tokio
MIT License
205 stars 67 forks source link

Serial data corrupted durning heavy writes #60

Closed brentalanmiller closed 1 year ago

brentalanmiller commented 1 year ago

Using tokio-serial 5.4.4 and rust 1.65.0 on both MacOS and Linux I'm seeing corrupted data during times of heavy writes. When I run the following small demo application:

use tokio::io::AsyncWriteExt;
use tokio_serial::{SerialPort, SerialPortBuilderExt, FlowControl};

const TTY_A: &str = "/dev/ttyUSB0";

#[tokio::main]
async fn main() -> tokio_serial::Result<()> {
    let mut tty_a = tokio_serial::new(TTY_A, 115200).open_native_async()?;
    tty_a.set_flow_control(FlowControl::Hardware)?;
    println!("{:?}", tty_a.flow_control()?);

    let mut buffer = [0u8; 95];

    for i in 0..buffer.len() {
        buffer[i] = 32 + i as u8;
    }

    let writer_h = tokio::spawn(async move {
        let mut counter = 0u64;

        loop {
            let message = format!("loop iteration: {}, {}\n", counter, std::str::from_utf8(&buffer).unwrap());
            tty_a.write(message.as_bytes()).await.unwrap();
            counter += 1;
        }
    });

    writer_h.await.unwrap();

    Ok(())
}

And then open picocom in a terminal with the command picocom -b 115200 --imap lfcrlf -f h /dev/ttyUSB1

I'll frequently see output like this:

loop iteration: 78314,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78315,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78316,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78317,  !"#$%&'()*+,-./0123456789:loop iteration: 78318,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78319,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78320,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78321,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78322,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
loop iteration: 78323,  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

I've got two FTDI cables connected together for the purpose of this test. I put together a similar program in python and don't see the same corruption so I don't think its a cable / pc issue.

jadamcrain commented 1 year ago

The interleaved output between iterations 78317 -> 78319 is pretty odd. That doesn't look like data corruption, but more like some kind of thread race condition... like if you were printing from multiple threads to std out.

Just to eliminate hardware/corruption as the source of the issue, have you tried using virtual serial ports?

You can create a virtual serial port pair using socat:

> socat -d -d pty,raw,echo=0 pty,raw,echo=0
2022/12/21 19:01:14 socat[254] N PTY is /dev/pts/1
2022/12/21 19:01:14 socat[254] N PTY is /dev/pts/2
2022/12/21 19:01:14 socat[254] N starting data transfer loop with FDs [5,5] and [7,7]

I'll try this tomorrow with your app and see if I can replicate any of this.

brentalanmiller commented 1 year ago

Hi, thanks for the suggestion. I tried using the virtual serial ports using socat and see the same result.

Your comment about the threaded race condition got me thinking, so I also tried wrapping my main with #[tokio::main(flavor = "current_thread")] instead of just #[tokio::main] and also see the same issue.

jadamcrain commented 1 year ago

Yeah, I got the same result using socat too.

It's also less of an "interleaving" as it is a truncation (data loss) in loop iteration: 78317.

I don't have time this week to take more detailed look, but I'm definitely curious to figure this out.

codepainters commented 1 year ago

I believe it's incorrect to assume that tokio::io::AsyncWriteExt::write() writes all bytes at once, it returns the number of bytes actually written - just like plain, old blocking write() syscall does.

I was able to reproduce the problem above and solve it replacing write() with write_all().

jadamcrain commented 1 year ago

Can't believe that I didn't notice that earlier. You're absolutely right.

@brentalanmiller: I think this should close this issue.

brentalanmiller commented 1 year ago

Yeah, thanks for the clarification, it's right there as plain as day in the docs:

This function will attempt to write the entire contents of buf, but the entire write might not succeed, or the write may also generate an error. A call to write represents at most one attempt to write to any wrapped object.

Calls to write are not guaranteed to block waiting for data to be written, and a write which would otherwise block can be indicated through an Err variant.

DanielJoyce commented 1 year ago

I was just thinking of this issue and suspecting the problem was not checking how many bytes were written...