Closed moxian closed 2 years ago
std::thread::sleep blocks entire thread so no other request being served by that thread. so eventually you block all threads.
you can not execute blocking synchronous operation on async thread
you can not execute blocking synchronous operation on async thread Would that mean reading a file to serve it from a handler is not allowed, as that is a synchronous blocking operation?
I don't find this explanation compelling.
Yes, std::thread::sleep
blocks the entire thread, but it only does so for until the sleep is over. Once the specified duration has passed, the thread should become unblocked and ready to serve further requests.
In my case, however, the application hangs forever.
I've just restarted the binary, refreshed the page, refreshed it again, and then pressed F5 5 times in quick succession. It's been at least a minute since then, the page is still "loading", and trying to open it in a new tab yields no results. The console has:
PS D:\work\trash\repros\tera-hang> cargo run
Compiling tera-hang v0.1.0 (D:\work\trash\repros\tera-hang)
warning: field is never used: `slow_stuff`
--> src\main.rs:21:5
|
21 | slow_stuff: SlowStuff,
| ^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
Finished dev [unoptimized + debuginfo] target(s) in 29.53s
Running `target\debug\tera-hang.exe`
SystemTime { intervals: 132059921846500000 } got request to: /
got unit: 0x257a89
made a response
SystemTime { intervals: 132059921866920000 } got request to: /
got unit: 0x257a89
made a response
SystemTime { intervals: 132059921886330000 } got request to: /
got unit: 0x257a89
made a response
The application is not being stuck inside my handler, but somewhere in actix instead.
Could you port your example to 1.0, 0.7 is in maintenance mode
Here is what seems to be 1.0 port:
use actix_web::{HttpRequest, HttpResponse};
use std::sync::{Arc, Mutex};
struct Unit;
struct SlowStuff {}
impl SlowStuff {
pub fn new() -> SlowStuff {
// some delay here seems crucial, since I couldn't reproduce without it
// it does not need to be thread::sleep, but can be
// e.g. reading and parsing a big file.
std::thread::sleep(std::time::Duration::from_secs(2));
SlowStuff {}
}
}
struct State {
unit: Arc<Mutex<Unit>>,
// wrapping the slow stuff into Arc<Mutex<>> can be done but doesn't matter
slow_stuff: SlowStuff,
}
fn index(req: HttpRequest, data: actix_web::web::Data<State>) -> HttpResponse {
// let data = req.app_data::<State>().unwrap();
println!(
"{:?} got request to: {:?}",
std::time::SystemTime::now(),
req.uri()
);
{
// Taking the lock is moderately important.
// I think I've reproduces this several times without taking the
// lock but it's very inconsistent.
let unit: &Unit = &data.unit.lock().unwrap();
// this delay is important. Couldn't reproduce without
// it does not need to be thread::sleep, but can be
// e.g. reading and parsing a big file.
std::thread::sleep(std::time::Duration::from_millis(1000));
println!("got unit: {:?}", unit as *const Unit);
}
let res = format!("hello");
let resp = HttpResponse::Ok().body(res);
println!("made a response");
resp
}
fn main() {
actix_web::HttpServer::new(move || {
actix_web::App::new()
.data(State {
unit: Arc::new(Mutex::new(Unit)),
slow_stuff: SlowStuff::new(),
})
.service(actix_web::web::resource("/").to(index))
})
// worker count does not matter, but I assume 1 is easier to debug?
.workers(1)
.bind("127.0.0.1:8080")
.unwrap()
.run()
.unwrap();
}
but I cannot reproduce the issue with that.
Without understanding of what the root cause of the issue is in 0.7, I feel a little bit uneasy not knowing if it was actually fixed in 1.0, or just hidden (and now needs a somehow slightly tweaked example to resurface). In other words, I would be happy to see an investigation into what's going on, but not necessarily a fix.
That said, closing the issue now is probably also totally reasonable.
Hello, I ported my real code to actix 1.0, but unfortunately hit the same issue...
Here's the reproduction case - the only difference from above that was needed is an extra function that takes a reference to mutex-guarded contents:
P.s.: A data point: If there are several HttpServers in the actix system, then only the one bound to this particular address gets unresponsive. The rest continue working fine.
I tried running the example under a logger and got the following:
Finished dev [unoptimized + debuginfo] target(s) in 6.00s
Running `target\debug\main1.exe`
2019-06-27 05:53:36,219 ←[36mINFO ←[0m [actix_server::builder] Starting 1 workers
2019-06-27 05:53:36,220 ←[36mINFO ←[0m [actix_server::builder] Starting server on 127.0.0.1:8080
2019-06-27 05:53:38,223 ←[31mERROR←[0m [actix_server::accept] Can not resume socket accept process: The parameter is incorrect. (os error 87)
SystemTime { intervals: 132060884182248000 } got request to: /
made a response
SystemTime { intervals: 132060884203728000 } got request to: /
made a response
So this is the same root cause as in https://github.com/actix/actix-web/issues/905
I couldn't reproduce it with Debug or lower logging level so far, but I'll update the bug if I manage to
P.s.: I mentioned that my go-to reproduction steps here is to just press F5 several times in quick succession, but when I encountered that in real code, i was refreshing the page as one normally would, with no urgency, when the server was totally idle otherwise.
P.p.s: Managed to get Debug level:
Caught the thing with Trace logging on - listener-trace-1561618348.log . This is a real application, so a bunch of unrelated stuff might be going on. There are, unfortunately, two http servers. The one affected is on port 8081
Edit: another one - listener-trace-1561619274.log. This time there's an Info
line when exiting and entering the handler. The socket gets lost way before getting there. I believe (but am surprisingly unsure) that the content did get served to the browser, and it's only the subsequent requests that got stalled.
I don't have access to windows, and I can not reproduce this on osx
Is there anything I can help here with? I don't understand actix enough to debug this on my own, but would be happy to add whatever diagnostics needed.
I have delved into this a bit.
The callstack is the following:
actix_server::accept::Accept::process_cmd
https://github.com/actix/actix-net/blob/server-v0.5.1/actix-server/src/accept.rs#L303
mio::poll::Poll::register
https://github.com/tokio-rs/mio/blob/v0.6.19/src/poll.rs#L790
mio::sys::windows::selector::ReadyBinding::register_socket
https://github.com/tokio-rs/mio/blob/v0.6.19/src/sys/windows/selector.rs#L361
mio::sys::windows::selector::Binding::register_socket
https://github.com/tokio-rs/mio/blob/v0.6.19/src/sys/windows/selector.rs#L200
miow::iocp::CompletionPort::add_socket
https://github.com/alexcrichton/miow/blob/0.2.1/src/iocp.rs#L74
miow::iocp::CompletionPort::_add
https://github.com/alexcrichton/miow/blob/0.2.1/src/iocp.rs#L80
winapi::CreateIoCompletionPort
This is not exposed in the docs for some reason, but there is this interesting comment inside mio::register:
/*
* Undefined behavior:
* - Reusing a token with a different `Evented` without deregistering
* (or closing) the original `Evented`.
*/
Following this all way down to CreateIoCompletionPort MSDN entry, we see it mention, in passing: This completion key should be unique for each file handle, and it accompanies the file handle throughout the internal completion queuing process
. And, evidently, violating this requirement gets is treated as passing in incorrect parameter, which is fair, I guess.
When I add diagnostic print, I see, that CreateIoCompletionPort is indeed called with the same non-null parameters multiple times. And this makes sense, since in the above trace logs you can see (The number in (brackets) is said to be thread number):
06:52:28 [TRACE] (6) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
<...>
06:52:28 [TRACE] (12) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
<...>
06:52:30 [TRACE] (12) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
06:52:30 [ERROR] Can not resume socket accept process: The parameter is incorrect. (os error 87)
<...>
Why is token the same? I'm not sure... I tried reading actix-server code but didn't get far, unfortunately.
Hope this helps!..
(This is probably not needed, but just in case: another Trace log with a tiny bit more logging (and slightly different line numbers as a result): listener-1561709109-trace.log )
@moxian thanks for investigation, that is helpful. i'll try to come up with some fixes
I ran into this on Windows 10 with actix-web 2.0 but unfortunately I did not have debug logging enabled and now I can't replicate. @moxian do you have any more information about this bug apart from what is mentioned in this issue? I'm happy to help troubleshoot in any way :)
No, sorry, I didn't use actix much since filing this report. This issue captures all I know about the problem.
I was able to see this issue, using React useEffect without adding a return [] in the effect method, I have a Rust/Actix app, in my routes
.route(
"/api/v1/users{_:/?}",
web::post().to(users_controller::index),
)
all what my user controller is doing is somehow calling reqwest (this is some sort of proxy http call)
.post("http://someotherwebservice:3333/api/legacy_users")
.json(&request_body)
.send()
.await
.expect("error")
.json()
.await
.expect("error");
When i call this from the React client: this responds successfully the first time with data, but then second time it completely hangs Actix
React.useEffect(() => {
await fetch('http://localhost_actix_port/api/v1/users');
});
But when adding [] to the React.useEffect it doesn't block the backend. This is a strange behavior because basically any app can completely create a DDoS attach to any backend that uses an endpoint that proxy http-requests in this way
React.useEffect(() => {
await fetch('http://localhost_actix_port/api/v1/users');
}, []); <- this avoids an infinite loop
actix-cors = "0.2.0" actix-web = "2" actix-service = "1.0.5" actix-rt = "1" reqwest = { version = "0.10", features = ["blocking", "json"] }
I can create an example if necessary with that part of the app.
After digging in this issue a little more, my issue is actually different, seems like i was sending a burst of request to actix_web from the frontend (React), after x amount of request let's say a > 50+ per second actix was locking internally, and all the other subsequents requests also failed. I was able to resolve my issue by using this:
add this dep to your Cargo.toml
actix-ratelimit = "0.2.1"
use std::time::{Duration};
use actix_ratelimit::{RateLimiter, MemoryStore, MemoryStoreActor};
add this middleware:
.wrap(
RateLimiter::new(
MemoryStoreActor::from(store.clone()).start())
.with_interval(Duration::from_secs(1))
.with_max_requests(5))
Idk if this helps someone else with the same issue.
I was able to see this issue, using React useEffect without adding a return [] in the effect method, I have a Rust/Actix app, in my routes
.route( "/api/v1/users{_:/?}", web::post().to(users_controller::index), )
all what my user controller is doing is somehow calling reqwest (this is some sort of proxy http call)
.post("http://someotherwebservice:3333/api/legacy_users") .json(&request_body) .send() .await .expect("error") .json() .await .expect("error");
When i call this from the React client: this responds successfully the first time with data, but then second time it completely hangs Actix
React.useEffect(() => { await fetch('http://localhost_actix_port/api/v1/users'); });
But when adding [] to the React.useEffect it doesn't block the backend. This is a strange behavior because basically any app can completely create a DDoS attach to any backend that uses an endpoint that proxy http-requests in this way
React.useEffect(() => { await fetch('http://localhost_actix_port/api/v1/users'); }, []); <- this avoids an infinite loop
actix-cors = "0.2.0" actix-web = "2" actix-service = "1.0.5" actix-rt = "1" reqwest = { version = "0.10", features = ["blocking", "json"] }
I can create an example if necessary with that part of the app.
Oh my God, same issue with me this month... I was desperate to find out the cause 😂
After digging in this issue a little more, my issue is actually different, seems like i was sending a burst of request to actix_web from the frontend (React), after x amount of request let's say a > 50+ per second actix was locking internally, and all the other subsequents requests also failed. I was able to resolve my issue by using this: add this dep to your Cargo.toml
actix-ratelimit = "0.2.1"
use std::time::{Duration}; use actix_ratelimit::{RateLimiter, MemoryStore, MemoryStoreActor};
add this middleware:
.wrap( RateLimiter::new( MemoryStoreActor::from(store.clone()).start()) .with_interval(Duration::from_secs(1)) .with_max_requests(5))
Idk if this helps someone else with the same issue.
let me try this solution.
let me try this solution.
it seems this didn't work for me, because I didn't pass dependency array into useEffect hooks, so I opted for a refactor instead.
closing as stale. if there are more updates, please open a new issue in the context of newer actix web versions
Having the same issue.
actix-web = 4.3.1 rand = "0.8" tokio = { version = "1.26", features = ["time"]
#[get("/small")]
async fn small() -> impl Responder {
let mut rng = thread_rng();
let rand: f64 = rng.gen();
if rand >= 0.6 {
tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
}
HttpResponse::Ok().body(String::from("Hello world!"))
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
.service(small)
})
.bind(("0.0.0.0", 8080))?
.run()
.await
}
I removed the random and sleep part, and it still hangs indefinitely.
I then removed the tokio specific install from cargo. downgraded to actix 3. I ran cargo clean
here is new cargo.toml
[package]
name = "actix-test"
version = "0.1.0"
edition = "2021"
publish = ["cargo"]
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
actix-web = "3"
#[get("/small")]
async fn small() -> impl Responder {
HttpResponse::Ok().body(String::from("Hello world!"))
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
.service(small)
})
.bind(("127.0.0.1", 8080))?
.run()
.await
}
First response returns fine. Second one gets econnreset.
I removed the request body which was application/json, sure enough, fixed the issue.
I should just be pitching the request body, as i don't specify I want it in my get endpoint. The total size of the payload is 170kb. A possible default is 256kb, So i reduced the size a bit so that 2 requests should be fine and not overflow that size. So two at 120kb, still blew up on the second one. Even at size 76kb I had the issue. I went down to an absolutely tiny payload, and no longer got the issue.
What do I need to do, to get the framework to accept these payloads?
If you send a payload, it should be read, or else the HTTP/1.1 connection will get stuck.
#[get("/small")]
async fn small(_body: Bytes) -> impl Responder {
...
}
The code (for
actix-web-0.7
, I didn't try reproducing on 1.0 yet):Go to http://localhost:80801/, see the page showing fine. If you refresh the page several times (it might need several refreshes in quick succession? unsure.. I just tap F5 real quick), the page would get into an eternal "loading" state, further attempts to access the endpoint (also from other browser tabs) would never give response.
Stacktrace as reported by ProcessExplorer in the frozen state (identical for both the main thread, and the two others):
Windows 7, firefox browser.
P.s.: I cannot reproduce it on linux, so probably this is windows-specific..