The high latency when send the big protobuf message

fy2462 commented 2 years ago

Hi guys:

I have a problem worried with me. I send the protobuf bytes with s2n-quic, but I can not parse the proto on receive side when the bytes size is too long.

Do we have any good encode and decode frame data solution?

Thanks all of you.

camshaft commented 2 years ago

Application-level framing isn't really in scope of s2n-quic.

If you're sending multiple protobuf messages on a single stream, I would recommend looking at tokio_util::codec to implement a solution. All streams in s2n-quic implement the required traits to easily integrate with that crate.

You can also open a stream per message and buffer in the application until the stream is finished and decode the message at the end.

fy2462 commented 2 years ago

Thanks, @camshaft. I have encoded and decoded with tokio_util now. Another problem: I have opened a open_bidirectional_stream on client-side, and accept_bidirectional_stream on server side.

and then transfer the video streaming with the single-stream from client to server. but I got the high latency, and the latency grows over time. How to resolve it? Do I need to open_bidirectional_stream for every frame? but I want to keep the frames relatively ordered.

What do you have a good idea about this case?

fy2462 commented 2 years ago

Do I need to set the builder limits on both sides? Do you have the recommended limit parameter set for my case?

camshaft commented 2 years ago

Without seeing the code I can really only speculate what the source of the slowdown is. Did you compile in release mode? What network are you testing on? Is the sender sending fast enough? Is the receiver reading fast enough? Did you try profiling the endpoints? Are you limited by CPU?

fy2462 commented 2 years ago

Thanks for your response @camshaft, and sorry for the less clue:

I have tried to debug and release versions, and the experience is almost. The transmission is ok when the network latency is < 20ms. So I think it is not the send and receive problem. No CPU limit.

But I add the latency with sudo tc qdisc add wlp0s20f3 root netem delay 40ms, and I can see the latency will be up always to server. the data bitrate is 2.5Mbps.

I have set a timestamp in client data. and the server will return the timestamp to the client, and the client print the time difference.

the client log output

[WARN] 2022-06-24T11:32:32.491 to_server_rtt: 344ms, 
[WARN] 2022-06-24T11:32:35.062 to_server_rtt: 903ms, 
[WARN] 2022-06-24T11:32:37.647 to_server_rtt: 1486ms, 
[WARN] 2022-06-24T11:32:40.174 to_server_rtt: 2014ms, 
[WARN] 2022-06-24T11:32:42.790 to_server_rtt: 2610ms, 
[WARN] 2022-06-24T11:32:45.322 to_server_rtt: 3148ms, 
[WARN] 2022-06-24T11:32:48.090 to_server_rtt: 3894ms, 
[WARN] 2022-06-24T11:32:50.841 to_server_rtt: 4649ms, 
[WARN] 2022-06-24T11:32:53.431 to_server_rtt: 5245ms, 
[WARN] 2022-06-24T11:32:56.034 to_server_rtt: 5829ms, 
[WARN] 2022-06-24T11:32:58.809 to_server_rtt: 6595ms, 
[WARN] 2022-06-24T11:33:01.468 to_server_rtt: 7247ms,

The ping output on the client-side:

64 bytes from 10.10.82.240: icmp_seq=5 ttl=63 time=44.8 ms
64 bytes from 10.10.82.240: icmp_seq=6 ttl=63 time=43.3 ms
64 bytes from 10.10.82.240: icmp_seq=7 ttl=63 time=49.6 ms
64 bytes from 10.10.82.240: icmp_seq=8 ttl=63 time=43.2 ms
64 bytes from 10.10.82.240: icmp_seq=9 ttl=63 time=46.5 ms
64 bytes from 10.10.82.240: icmp_seq=10 ttl=63 time=44.9 ms
64 bytes from 10.10.82.240: icmp_seq=11 ttl=63 time=42.9 ms

I set some limits on both sides, but it still doesn't work.

const ACK_LATENCY: u64 = 0;
const MAX_HANDSHAKE_DURATION: u64 = 3;
const MAX_ACK_RANGES: u8 = 100;
const CONN_TIMEOUT: u64 = 5;
const KEEP_ALIVE_PERIOD: u64 = 2;

Client code:


pub async fn new_for_client_conn(
        server_addr: SocketAddr,
        local_addr: SocketAddr,
    ) -> ResultType<BidirectionalStream> {
        let io = IoBuilder::default()
            .with_receive_address(local_addr)?
            .build()?;

        let limits = Limits::new()
            .with_max_ack_delay(Duration::from_millis(ACK_LATENCY))
            .expect("set ack delay failed.");
        limits
            .with_max_ack_ranges(MAX_ACK_RANGES)
            .expect("set ack max rangees failed.");
        limits
            .with_max_handshake_duration(Duration::from_secs(MAX_HANDSHAKE_DURATION))
            .expect("set max handshake duration failed.");
        limits
            .with_max_idle_timeout(Duration::from_secs(CONN_TIMEOUT))
            .expect("set max idle timeout failed.");
        limits
            .with_max_keep_alive_period(Duration::from_secs(KEEP_ALIVE_PERIOD))
            .expect("set max keep alive period failed.");

        let client = Client::builder()
            .with_tls(Path::new(CERT.cert_pom))?
            .with_limits(limits)?
            .with_io(io)?
            .start()
            .unwrap();

        let connect = Connect::new(server_addr).with_server_name("localhost");
        let mut connection = client.connect(connect).await?;
        connection.keep_alive(true)?;

        let stream = connection.open_bidirectional_stream().await?;
        Ok(stream)
    }

Server code


 pub fn new_server(bind_addr: SocketAddr) -> ResultType<Server> {
        let io = IoBuilder::default()
            .with_receive_address(bind_addr)?
            .build()?;
        let limits = Limits::new()
            .with_max_ack_delay(Duration::from_millis(ACK_LATENCY))
            .expect("set ack delay failed.");
        limits
            .with_max_ack_ranges(MAX_ACK_RANGES)
            .expect("set ack max rangees failed.");
        limits
            .with_max_handshake_duration(Duration::from_secs(MAX_HANDSHAKE_DURATION))
            .expect("set mac handshake duration failed.");
        limits
            .with_max_idle_timeout(Duration::from_secs(CONN_TIMEOUT))
            .expect("set max idle timeout failed.");
        limits
            .with_max_keep_alive_period(Duration::from_secs(KEEP_ALIVE_PERIOD))
            .expect("set max keep alive period failed.");

        let server = Server::builder()
            .with_tls((Path::new(CERT.cert_pom), Path::new(CERT.key_pom)))?
            .with_limits(limits)?
            .with_io(io)?
            .start()
            .unwrap();
        Ok(server)

    // Some(mut new_conn) = server.accept() => {
    //     let client_addr = new_conn.remote_addr()?;
    //     tokio::spawn(async move {
    //         while let Ok(Some(stream)) = new_conn.accept_bidirectional_stream().await {
    //             tokio::spawn(async move {
    //               loop {
    //                      let Ok(Some(data)) = stream.receive().await {
    //                               .....
    //                              stream.send(data).await.expect("stream should be open");
    //                      }
    //                 }
    //             });
    //         }
    //     });
    // }
    //}

PS: What do I miss in my code? and do I need to change my limits? or what monitor tool can analyze the s2n-quic latency data?

fy2462 commented 2 years ago

Any idea about this case, guys? @camshaft @WesleyRosenblum @toidiu Thanks.

camshaft commented 2 years ago

The code you shared isn't reproducible so I'm not sure what issue you're seeing. If you make a repo for the minimal amount of code it takes to reproduce this, we can take a look when we get time.

fy2462 commented 2 years ago

@camshaft got it, I will create a test repo to reproducible. thanks.

fy2462 commented 2 years ago

@camshaft Sorry for the late response because of busy work. I have coded the demo to reproduce the case.

https://github.com/fy2462/s2n_quic_latency_reproducer

You can get the details by following the README.

I don't know why there is high latency in the code. please tell me the root cause if you are free to test my code. thanks very much.

I am waiting for your response. thanks again.

camshaft commented 2 years ago

Thanks for creating the repo. We will add it to our list of things to do and follow up after investigating it.

In the meantime, I believe you're running into buffer bloat issues. Basically if you're producing data faster than the network can carry, your stream buffers are going to grow and grow until you hit the buffer limits and then s2n-quic will start applying backpressure on the sender. If you want to avoid this behavior you can call flush on the stream after you send a message and this will make sure the peer receives the data before sending the next one. It's possible that this isn't what is happening. But looking at the initial results, that's my hunch.

fy2462 commented 2 years ago

Thanks a lot, @colmmacc, I think the bitrate is only < 5MBps in my demo, Why is low and stable latency when I use the TCP connection?

More questions:

Is it involves more latency after flushing the buffer for every frame of data?
What limits related I can set to keep the communication latency stable and low? I can try it in my code.
What tools I can use to monitor the production and consumption capability on both sides with S2N-QUIC?

fy2462 commented 2 years ago

Hi @camshaft, any test progress for my code. thanks.

goatgoose commented 2 years ago

Hi @fy2462,

Looking into this is still something we'd like to do, but we haven't gotten to it yet. After we investigate this further we'll let you know.

Thanks!

fy2462 commented 2 years ago

Got it, thanks @goatgoose. BTW, I have set some s2n-quic limits in my code. you can find it here, but it doesn't work.

fy2462 commented 2 years ago

Hi, @goatgoose, Any time to look into this? strongly want to know why? Thank you very much again.

fy2462 commented 2 years ago

@goatgoose @camshaft Hi guys, s2n-quic is a great project, and have easy API to use, I am very hope to use it in my production project, But I think there is a little slow for issue response and trance in community. I think it maybe busy for your work. but It is not good for the project growing up. Anyway, I hope s2n-quic will be the top 1 project for QUIC. Thanks for your work. looking forward your response.

camshaft commented 2 years ago

As path latency increases, your buffering limits also need to increase. You can try to apply the same limits as the perf server:

https://github.com/aws/s2n-quic/blob/e5ae19ff115a09d29a8b041728393df85f0b69ea/quic/s2n-quic-qns/src/perf.rs#L115-L125

fy2462 commented 2 years ago

@camshaft Thanks very much, I will try it.

fy2462 commented 2 years ago

@camshaft I set the limits with the example, but panic happened. thread 'tokio-runtime-worker' panicked at 'Receive window must not exceed 32bit range', when the client connect to server.

https://github.com/fy2462/s2n_quic_latency_reproducer/blob/a61bdb56a81c20aa983265a38898df2ddf47973d/hbb_common/src/quic.rs#L172

camshaft commented 2 years ago

So don't make the value so high then. There's no way you're going to get 5000Gbps over a 400RTT network.

fy2462 commented 2 years ago

@camshaft Got it, thanks. So I think this comment should be Mbits/s https://github.com/aws/s2n-quic/blob/e5ae19ff115a09d29a8b041728393df85f0b69ea/quic/s2n-quic-qns/src/perf.rs#L102

fy2462 commented 2 years ago

@camshaft Thanks for your time to response, I try to change the limits as: https://github.com/fy2462/s2n_quic_latency_reproducer/blob/2dac5a513afdadf65ab0ea9ed740019f1abbbc07/hbb_common/src/quic.rs#L170

But the demo still works bad after I mock the network latency, Could you help me to test the demo in your own PC, and find why it bad performance. miss it if you are not free, thanks again.

PS: I mock the latency with sudo tc qdisc add dev eth0 root netem delay 50ms on the demo client side.

rrichardson commented 2 years ago

@fy2462 - What is your bandwidth/latency as measured by iperf (https://github.com/esnet/iperf) when you have that netem delay set?

netem delay 50ms adds a delay to every packet that is going out of that interface.
I am going to assume that your PATH MTU of that interface is the default of 1500 bytes. This means it's going to take about 118 packets to deliver a 177k buffer (the size of image.png)

I'm not sure where the netem filter actually applies its delays, but it could be sending every packet in serial, and delaying each one 50ms. If this is the case, it should take 5.9 seconds to complete the delivery of that 177k image.png. In this case, your transfer rate would be 19.8Kbps assuming everything else was instantaneous.

In short, I don't think you're actually hitting the latency of s2n-quic here. If you want to test the overall bandwidth of your server.. I recommend spinning up a couple hundred clients in parallel, then you could get an idea of the overall throughput of the server, even with laggy network.

rrichardson commented 1 year ago

@fy2462 - Have you run into any other troubles with performance?

aws / s2n-quic

The high latency when send the big protobuf message #1371