Add `bytes_to_read` and `try_read` support

zh522130 commented 2 months ago

Thank you for creating this crate, which makes using serial ports across platforms much more convenient. However, I have encountered an issue.

When receiving data via port.read on the Windows platform, the transmitted data frame may be read in multiple parts. For example, if I send [55, 0, 0, 0, 0, 0, 0, 0], the received data might be [5], [0, 0, 0, 0, 0, 0, 0]. I want the data sent in one transmission to be received in a single read as much as possible. I hope to achieve this by first receiving one byte of data, then querying how much data is left, and combining the remaining data with the first byte. Alternatively, this might also be achieved using a method similar to try_read to merge the data.

de-vri-es commented 2 months ago

Hey!

In general, read() will return whatever data is available as soon as possible. This could be less then one message, but it could also be more then one message!

You can use read_exact() if you know exactly how much data you want to read.

For example:

let payload_len = [0];
serial_port.read_exact(&mut payload_len)?;
let payload_len: usize = payload_len[0].into();

let mut payload = vec![0; payload_len];
serial_port.read_exact(&mut payload)?;

If you want a single buffer to hold the entire message (with length and data) you can do this:

let mut buffer = vec![0];
serial_port.read_exact(&mut buffer)?;
let payload_len: usize = buffer[0].into();
buffer.resize(1 + payload_len, 0);
serial_port.read_exact(&buffer[1..])?;

Note that read_exact() may perform multiple read() calls internally to satisfy the request. As such, this is not an optimization of the number of syscalls. It's just an easy API to achieve the desired result.

zh522130 commented 2 months ago

Thank you for your response. I'm not sure about the exact length, but I just hope that the data frame sent by the sender is not split. Currently, under Linux, I have tested that the data will not be split (not too long), but under Windows, the data will be split.

You can use read_exact() if you know exactly how much data you want to read.

For example:
let payload_len = [0];
serial_port.read_exact(&mut payload_len)?;
let payload_len: usize = payload_len[0].into();

let mut payload = vec![0; payload_len];
serial_port.read_exact(&mut payload)?;
If you want a single buffer to hold the entire message (with length and data) you can do this:
let mut buffer = vec![0];
serial_port.read_exact(&mut buffer)?;
let payload_len: usize = buffer[0].into();
buffer.resize(1 + payload_len, 0);
serial_port.read_exact(&buffer[1..])?;
Note that read_exact() may perform multiple read() calls internally to satisfy the request. As such, this is not an optimization of the number of syscalls. It's just an easy API to achieve the desired result.

Does the code you provided require me to send the length information in the first byte from the sender? If so, this code may not be suitable for me.

de-vri-es commented 2 months ago

You can never assume the data is not split. It may not happen in your tests on Linux, but there are no guarantees from the OS.

You need to add some data framing. There are two common ways to do that

Send the length of the message first, then the message data. This is the easiest method because on the receiving side you can easily prepare a buffer of the correct size to receive the entire message data.
Use a delimiter to signal the end of a message. With serial lines, another delimiter is often used for the start of a message too, so that you can more easily detect when you're reading noise from an idle line.

I find this more annoying to work with, because you need to ensure that the delimiter doesn't occur in the data. And on the receiving side you never know how much data to read until you find the delimiter. When you finally found the delimiter, you may already have read data belonging to the next message too.

This is still commonly used though. For example, many protocols work by writing one line of ASCII for a message, terminated with a newline.

If you have start/end delimiter framing, you can wrap the SerialPort in a tokio::io::BufReader and use tokio::io::AsyncBufReadExt::read_until or tokio::io::AsyncBufReadExt::read_until.

It is also possible to frame messages based on the line going silent. But it's somewhat fragile compared to the other two methods. If you do want to do that, you can simply call read() in a loop to append to a buffer until it reports a TimedOut.

zh522130 commented 2 months ago

I understand your point. I'm not using this library to create my own communication protocol, but rather to develop a small tool for serial port logging. The data sent is not necessarily a string; for each data frame, a timestamp will be recorded. I believe this is a Windows issue, as I have tested several Windows libraries and they all exhibit this problem. For the serial-rs library, I employed the previous solution: read one byte first, then check how many bytes remain. That's why I made this feature request.

de-vri-es commented 2 months ago

I see. But in order to log a timestamp per frame you need to know what a frame is.

The solution of reading one byte and then checking how much data is available is basically just doing two reads and hoping that by the time the first read finished, the second one has the remaining data for a full frame.

But there is no guarantee that at this point the OS buffer has all data for the first message. Maybe even worse, the OS buffer might also contain data belonging two frames at this point.

In short, you have to make an assumption about framing. And it sounds like you want to use timeout based framing. Then I would suggest to implement something like read_until_silent():

use serial2_tokio::SerialPort;

async fn read_until_silent(serial_port: &mut SerialPort, timeout: std::time::Duration) -> std::io::Result<Vec<u8>> {
    let mut buffer = Vec::new();
    let mut received = 0;
    loop {
        buffer.resize(buffer.len() + 64, 0);
        match tokio::time::timeout(timeout, serial_port.read(&mut buffer[received..])).await {
          Err(_elapsed) => break,
          Ok(read_result) => received += read_result?,
        }
    }

    buffer.truncate(received);
    Ok(buffer)
}

zh522130 commented 2 months ago

But there is no guarantee that at this point the OS buffer has all data for the first message. Maybe even worse, the OS buffer might also contain data belonging two frames at this point.

Yes, reading in two steps does not guarantee complete safety; sometimes, the second read still does not yield any data, and at other times, data can become concatenated. For me, concatenated packets are not a problem as long as the two frames of data are sufficiently close in time and can be considered as a single packet.

In short, you have to make an assumption about framing. And it sounds like you want to use timeout based framing. Then I would suggest to implement something like read_until_silent():

I have considered the probabilistic timeout approach, but it does have an impact on real-time performance, so I did not adopt this method.

de-vri-es commented 2 months ago

I don't really know what your use case is, or what you consider to be real-time performance, but you will never get sub-millisecond precision in timing when reading data from a serial port on a regular computer. Certainly not with tokio in the middle.

But if you make my last example a bit more complicated, you can record the timestamp right after the first read() and right after the last read() that didn't time-out. Then you get a pretty good picture of the start and end-time of the received data.

And you can tweak the value of the timeout based on how long you want the line to remain silent to terminate a frame.

I really don't think you can do better than that with a serial port. You can use serial2 directly without tokio to get slightly more accurate timing, maybe.

Anyway, I'm not opposed to adding functions to query the size of the OS rx/tx buffer, as long as it works cross platform. I might get round to that at some point, but feel free to open a PR to get it done sooner. However, I really don't think it tells you anything useful. You can not predict the behaviour of read(), and having such functions will not change that.

zh522130 commented 2 months ago

The so-called real-time is just a relative concept. Using a timeout means that I have to wait for the timeout to end for each piece of data. I just think that if it's possible to check, there's no need to wait for the timeout every time. Currently, I am unable to provide a PR as I have only been working with Rust for a short period. Nevertheless, thank you for your response.

de-vri-es commented 2 months ago

Ah, you mean you want to show it live to the user?

Then you don't have to worry I think. If you use a timeout of around 10 to 50 ms nobody will be able to see.

And you could even show the result of every read immediately, but also indicate somehow that the application is still waiting for more data until the timeout expires (although with a short enough timeout I really don't think anyone can see). You can update your UI after each read().

zh522130 commented 2 months ago

While recording, it might also be necessary to perform additional data analysis through scripts. The scripts may also need to respond to the received data. What I want to create is a serial port logging tool that also supports serial data interaction through scripts.

de-vri-es commented 2 months ago

Should be fine, you can run the scripts when all data for a frame is in.

de-vri-es / serial2-tokio-rs

Add `bytes_to_read` and `try_read` support #12