actix / actix-web

Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.
https://actix.rs
Apache License 2.0
21.85k stars 1.69k forks source link

Connections hang in CLOSE-WAIT state #3182

Open Yerkwell opened 1 year ago

Yerkwell commented 1 year ago

If a post request with a large body gets error (413 or 404 or any other) as a response, the connection doesn't close, but hangs in CLOSE-WAIT state

Expected Behavior

After getting error connection is closed successfully

Current Behavior

Connection hangs in CLOSE-WAIT state with some bytes in Recv-Q

Possible Solution

As it only happens to requests with a large body, I guess there is something to do with the bytes that weren't read from a socket at the time the error happens. So we need to make sure these bytes are being read before closing the socket.

Steps to Reproduce (for bugs)

  1. Creating simple server
    
    use actix_web::{App, HttpServer};

[actix_web::main]

pub async fn main() { HttpServer::new(move || { App::new() }) .bind("127.0.0.1:5556").unwrap() .run().await.unwrap(); }

2. Making request with large body, that will end up with an error (404 in this case)
```python
import requests
requests.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})
  1. Checking sockets - we can see the connections to our server in CLOSE-WAIT state with a lot of bytes in Recv-Q
    $ ss -n4t | grep 5556
    FIN-WAIT-2  0       0            127.0.0.1:59240        127.0.0.1:5556          
    CLOSE-WAIT  787381  0            127.0.0.1:5556         127.0.0.1:59240

Context

We have an application that uses actix-web as http-api server. We've discovered that at some point it starts leaking the memory and connections. We've also found that another application is sending huge files to our app and keeps retrying as it gets 413 error. Future investigation showed that these things are connected and every 413 error leaves a connection in CLOSE-WAIT state. Later we managed to reproduce this problem with other errors (like 404 in the minimal example)

Your Environment

coolacid commented 3 months ago

This appears to be specifically related to keep-alive connections.

The following doesn't cause this issue:

import requests
s = requests.Session()
s.headers['Connection'] = 'close'
s.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})
gschulze commented 3 months ago

I can confirm this behavior. The number of connections that hang in the CLOSE-WAIT state increases every time a too-large payload has been sent. I'd very much appreciate a fix, as we are having huge issues in production right now. I'd also be willing to contribute, but I'm currently having difficulties finding my way around the codebase. My guess is something has to change in actix-http/src/h1/dispatcher.rs.

Remby commented 3 months ago

I am experiencing a similar issue. After deploying my website, it becomes inaccessible after running for some time. Upon investigation, I discovered a large number of connections stuck in the CLOSE-WAIT state. Could you please advise on how to resolve this problem? Thank you for your assistance.

gschulze commented 3 months ago

I did some systematic testing to narrow down the problem, by modifying the following parameters:

On the client-side, I used the Python requests library with the following setup:

session = requests.Session()
session.headers["Connection"] = "close" | "keep-alive"
session.post(...)
Server Keep-Alive Client Keep-Alive Payload Observation
disabled disabled normal Server connection remains open in TIME_WAIT state for 30 seconds
disabled disabled too large No open connections after request has finished
disabled enabled normal Server connection remains open in TIME_WAIT state for 30 seconds
disabled enabled too large No open connections after request has finished
enabled disabled normal Server connection remains open in TIME_WAIT state for 30 seconds
enabled disabled too large No open connections after request has finished
enabled enabled normal Server connection remains open in TIME_WAIT state for 30 seconds
enabled enabled too large Server connection remains open in CLOSE_WAIT state for 1 minute, Client connection remains open in FIN_WAIT2 state for 1 minute
anilaltuner commented 1 month ago

Hey everyone,

Is this problem resolved? When I use async http libs on Python, I'm getting this issue