Yerkwell commented 1 year ago

If a post request with a large body gets error (413 or 404 or any other) as a response, the connection doesn't close, but hangs in CLOSE-WAIT state

Expected Behavior

After getting error connection is closed successfully

Current Behavior

Connection hangs in CLOSE-WAIT state with some bytes in Recv-Q

Possible Solution

As it only happens to requests with a large body, I guess there is something to do with the bytes that weren't read from a socket at the time the error happens. So we need to make sure these bytes are being read before closing the socket.

Steps to Reproduce (for bugs)

Creating simple server
```
use actix_web::{App, HttpServer};
```

[actix_web::main]

pub async fn main() { HttpServer::new(move || { App::new() }) .bind("127.0.0.1:5556").unwrap() .run().await.unwrap(); }

2. Making request with large body, that will end up with an error (404 in this case)
```python
import requests
requests.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})

Checking sockets - we can see the connections to our server in CLOSE-WAIT state with a lot of bytes in Recv-Q

$ ss -n4t | grep 5556
FIN-WAIT-2  0       0            127.0.0.1:59240        127.0.0.1:5556          
CLOSE-WAIT  787381  0            127.0.0.1:5556         127.0.0.1:59240

Context

We have an application that uses actix-web as http-api server. We've discovered that at some point it starts leaking the memory and connections. We've also found that another application is sending huge files to our app and keeps retrying as it gets 413 error. Future investigation showed that these things are connected and every 413 error leaves a connection in CLOSE-WAIT state. Later we managed to reproduce this problem with other errors (like 404 in the minimal example)

Your Environment

Rust Version: 1.70.0
Actix Web Version: 4.4.0

coolacid commented 3 months ago

This appears to be specifically related to keep-alive connections.

The following doesn't cause this issue:

import requests
s = requests.Session()
s.headers['Connection'] = 'close'
s.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})

gschulze commented 3 months ago

I can confirm this behavior. The number of connections that hang in the CLOSE-WAIT state increases every time a too-large payload has been sent. I'd very much appreciate a fix, as we are having huge issues in production right now. I'd also be willing to contribute, but I'm currently having difficulties finding my way around the codebase. My guess is something has to change in actix-http/src/h1/dispatcher.rs.

Remby commented 3 months ago

I am experiencing a similar issue. After deploying my website, it becomes inaccessible after running for some time. Upon investigation, I discovered a large number of connections stuck in the CLOSE-WAIT state. Could you please advise on how to resolve this problem? Thank you for your assistance.

gschulze commented 3 months ago

I did some systematic testing to narrow down the problem, by modifying the following parameters:

Server Keep-Alive: Indicates whether HTTP keep-alive is enabled server-side via HttpServer::keep_alive(...).
Client Keep-Alive: Indicates whether Connection: keep-alive or Connection: close is passed as header.
Payload: Indicates whether a normal or too large payload has been sent, that means, one that is above 256 KB.

On the client-side, I used the Python requests library with the following setup:

session = requests.Session()
session.headers["Connection"] = "close" | "keep-alive"
session.post(...)

Server Keep-Alive	Client Keep-Alive	Payload	Observation
disabled	disabled	normal	Server connection remains open in TIME_WAIT state for 30 seconds
disabled	disabled	too large	No open connections after request has finished
disabled	enabled	normal	Server connection remains open in TIME_WAIT state for 30 seconds
disabled	enabled	too large	No open connections after request has finished
enabled	disabled	normal	Server connection remains open in TIME_WAIT state for 30 seconds
enabled	disabled	too large	No open connections after request has finished
enabled	enabled	normal	Server connection remains open in TIME_WAIT state for 30 seconds
enabled	enabled	too large	Server connection remains open in CLOSE_WAIT state for 1 minute, Client connection remains open in FIN_WAIT2 state for 1 minute

anilaltuner commented 1 month ago

Hey everyone,

Is this problem resolved? When I use async http libs on Python, I'm getting this issue

actix / actix-web

Connections hang in CLOSE-WAIT state #3182