Hyper example fails with "Transport endpoint is not connected" error

DataDog / glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.

Other

3.09k stars 163 forks source link

Hyper example fails with "Transport endpoint is not connected" error #386

Open nadenf opened 3 years ago

nadenf commented 3 years ago

Running the Hyper example using a basic ApacheBench test reveals the following error when run with 100 concurrent requests for 1000 total requests:

thread 'unnamed-1' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 110, kind: TimedOut, message: "Connection timed out" }', src/main.rs:63:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any { .. }', src/main.rs:67:19

And compared to Actix-Web it is around 10x slower.

fakeshadow commented 3 years ago

I don't know about the error but from what I see the example is single threaded and actix-web would use all of your cpu cores by default so they are not really comparable.

bryandmc commented 3 years ago

Hey, @nadenf can you provide the other relevant pieces of code such as the actix example, the ApacheBench code and/or at least an explanation of what it does? All that said there's a high likelihood @fakeshadow is correct that the example is single threaded so you would need more executors to be able to compete with something that's using all cores. Also, I think it's fair to say the hyper example is somewhat of a quick hack that (like any code) could have bugs that are causing issues.. So it would be nice to get to the bottom of that.

As for the error.. it's hard to say to be honest. The error could be caused by a legitimate timeout but it might be worth looking into that as well. Does this error happen every time you attempt it or is it spurious?

nadenf commented 3 years ago

Everything below was for 250 concurrent clients using Siege with a simple e.g. /hello -> "world" response example.

With this Thruster example = 148,744 rps.
Using your example as is = 11,462 rps.
Changing it to 8 max_connections = 130,412 rps.
Rewriting the code to be based off Fujita's GRPC work = 178,370 rps.

Maybe it's worth updating the example and submitting to Techempower because it looks like it could be the fastest Rust web server.

nadenf commented 3 years ago

This is the code that I used:

fn main() {
    let mut handles = Vec::new();
    for i in 0..num_cpus::get() {
        println!("Spwaning .. {} ", i);
        let h = std::thread::spawn(move || {
            let ex = LocalExecutorBuilder::new().pin_to_cpu(i).make().unwrap();
            ex.run(async move {
                hyper_compat::serve_http(([0, 0, 0, 0], 8000), hyper_demo, 10)
                    .await
                    .unwrap();
            });
        });
        handles.push(h);
    }
    for h in handles {
        let _ = h.join();
    }
}

glommer commented 3 years ago

I am happy to merge a patch that moves the example to use many threads. Even better if it was based on the new LocalExecutorPool API that launches N identical executors.

ChillFish8 commented 3 years ago

As an add on to this and playing around with it, when running with a more reasonable 1048 concurrent connections the hyper example as mentioned above and setting the concurrent connections to be 1048 per worker (just as a general number) I run into:

thread '<unnamed>' panicked at 'Stream from 127.0.0.1:8000 failed with error hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })', examples/hyper.rs:91:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Which leads me to believe the example isn't handling disconnects correctly?

glommer commented 3 years ago

I just want to set expectations correctly that this example is probably not handling a lot of things correctly. It's sole goal at the time was to demonstrate how you can use hyper with a ?Send executor. That's not to say I don't welcome improvements, though.

nadenf commented 3 years ago

@glommer .. Do you have any examples of what is not being handled correctly ?

I am working on a PR and new to Rust so it will be a good learning exercise 😀

glommer commented 3 years ago

I don't have any particular examples. I am not saying there is anything wrong, just setting expectations that it's not like we put a lot of effort in this example, so I wouldn't be surprised if y'all find issues (like the one above with @ChillFish8 )

ChillFish8 commented 3 years ago

The aforementioned issue is a little confusing because from what I can tell the code is fine correctly handling the sockets, just the stream is being shut down before hyper can flush the stream which probably requires more digging to work out why / what's causing it. But in general, I'm not sure if this is a massive issue in reality considering as you say, it's an example repo.

nadenf commented 3 years ago

@ChillFish8 .. The PR changes the behaviour from panic to eprintln when there is a connection issue.

Unfortunately Hyper doesn't expose the kind of error so it's difficult to only panic on serious errors.