actix / actix-web

Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.
https://actix.rs
Apache License 2.0
21.46k stars 1.66k forks source link

actix-web returns 400 bad request for http requests emitted by many user agents #3102

Open lovasoa opened 1 year ago

lovasoa commented 1 year ago

Hello, and first, thank you for this great library !

Recently, I published a blog post titled I’m sorry I forked you. In the title, the second character is a curly apostrophe ( U+2019 Right Single Quotation Mark).

I shared it online and started getting hits from a lot of different browsers. I significant portion of hits (I don't know which browsers exactly), did not encode the apostrophe (as %E2%80%99), but included the directly in the HTTP query.

There are two layers between the web and my actix service:

But when it got to actix-web, it failed to parse the query, and returned a 400 back without even invoking my code. The very confusing error message I got was: [ERROR actix_http::h1::dispatcher] stream error: Request parse error: Invalid Header provided (confusing because the problem did not state what the problem was exactly, and said it came from headers instead of the query string).

See: https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier

Expected Behavior

Since clients in the real world emit http requests with unicode characters, I think actix-web should accept them, and just invoke the user code with the unicode query string.

And when it encounters a real issue with the query string, it should say it comes from the query string, not from the headers, and give more details than just Request parse error.

Current Behavior

logs [ERROR actix_http::h1::dispatcher] stream error: Request parse error: Invalid Header provided

and returns an HTTP 400 bad request response to the client.

Steps to Reproduce (for bugs)

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    actix_web::HttpServer::new(|| actix_web::App::new())
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}
❯ curl -v 'localhost:8080/’'
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /’ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< content-length: 0
< connection: close
< date: Sun, 13 Aug 2023 20:01:26 GMT
< 
* Closing connection 0
lovasoa commented 1 year ago

I dug in the logs, and here is a list of some user agents that sent the requests with raw unicode chars:

``` 1 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 28) 1 Embed PHP library 1 Hatena::Fetcher/0.01 (master) Furl/3.13 1 Mediatoolkitbot (complaints@mediatoolkit.com) 1 Mozilla/5.0 (Android 13; Mobile; rv:109.0) Gecko/116.0 Firefox/116.0 1 Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20150302-2206 +http://127.0.0.1) 1 Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot) 1 Mozilla/5.0 (compatible; Qwantify-dev/1.0; +https://help.qwant.com/bot/) 1 Mozilla/5.0 (compatible; SemrushBot; +http://www.semrush.com/bot.html) 1 Mozilla/5.0 (compatible; Yeti/1.1; +https://naver.me/spd) 1 Mozilla/5.0 (iPhone; CPU iPhone OS 16_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Mobile/15E148 DuckDuckGo/7 Safari/605.1.15 1 Mozilla/5.0 (iPhone; CPU iPhone OS 16_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Mobile/15E148 DuckDuckGo/7 Safari/605.1.15 1 Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 DuckDuckGo/7 Safari/605.1.15 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot) 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/116.0 1 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36 1 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 1 Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.8.0.9) Gecko/20061206 Firefox/53.0 1 Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/95.0 1 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 1 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b3pre) Gecko/2008010415 Firefox/52.7.0 1 Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.0.1) Gecko/2008070206 Firefox/51.0 1 okhttp/4.10.0 1 python-requests/2.25.1 1 SerendeputyBot/0.8.6 (http://serendeputy.com/about/serendeputy-bot) 1 Twitterbot/1.0 2 com.apple.WebKit.Networking/18615.3.12.11.2 CFNetwork/1410.0.3 Darwin/22.6.0 2 com.apple.WebKit.Networking/8614.2.9.0.11 CFNetwork/1399 Darwin/22.1.0 2 com.apple.WebKit.Networking/8614.3.7.0.6 CFNetwork/1402.0.8 Darwin/22.2.0 2 com.apple.WebKit.Networking/8615.1.26.100.1 CFNetwork/1406.0.4 Darwin/22.4.0 2 com.apple.WebKit.Networking/8616.1.14.10.12 CFNetwork/1458.2.2 Darwin/23.0.0 2 magpie-crawler/1.1 (robots-txt-checker; +http://www.brandwatch.net) 2 Mozilla/5.0 (compatible) 2 Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 2 Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler) 2 Mozilla/5.0 (compatible; SeznamBot/4.0; +http://napoveda.seznam.cz/seznambot-intro/) 2 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) 2 Mozilla/5.0 (iPhone; CPU iPhone OS 10_0 like Mac OS X) AppleWebKit/602.1.38 (KHTML, like Gecko) Version/10.0 Mobile/14A5297c Safari/602.1 2 Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3 2 Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com) 2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 2 Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0 2 Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0 2 node-fetch 2 omgili/0.5 +http://omgili.com 2 Tiny Tiny RSS/21.05-326850845 (http://tt-rss.org/) 3 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 32) 3 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36 3 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 3 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 3 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0 4 com.apple.WebKit.Networking/8615.2.9.10.4 CFNetwork/1408.0.4 Darwin/22.5.0 4 com.apple.WebKit.Networking/8616.1.24.10.2 CFNetwork/1469 Darwin/23.0.0 4 curl/7.81.0 4 facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) 4 MobileSafari/8615.2.9.10.3 CFNetwork/1408.0.4 Darwin/22.5.0 4 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 4 Mozilla/5.0 (Linux; arm_64; Android 12; Pixel 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.2.98.00 SA/3 Mobile Safari/537.36 4 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 4 Safari/18615.2.9.11.4 CFNetwork/1408.0.4 Darwin/22.5.0 4 Safari/19616.1.24.11.3 CFNetwork/1469 Darwin/23.0.0 4 Twingly Recon 5 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 29) 5 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 30) 5 LinkPreview/1.6 (https://www.linkpreview.net) 5 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 6 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 31) 6 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0 6 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0 6 Safari/18615.1.26.110.1 CFNetwork/1406.0.4 Darwin/22.4.0 8 com.apple.WebKit.Networking/8614.2.9.0.10 CFNetwork/1399 Darwin/22.1.0 8 Mozilla/5.0 (compatible; Twingly Recon; twingly.com) 8 Safari/19616.1.26.11.3 CFNetwork/1474 Darwin/23.0.0 10 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 Edg/115.0.1901.203 12 com.apple.WebKit.Networking/8616.1.26.10.2 CFNetwork/1474 Darwin/23.0.0 12 Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 DuckDuckGo/7 Safari/605.1.15 12 Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko 13 Mozilla/5.0 (compatible; Qwantify-prod/1.0; +https://help.qwant.com/bot/) 14 Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2b5) Gecko/20091204 Firefox/3.6b5 16 MobileSafari/8614.2.9.0.10 CFNetwork/1399 Darwin/22.1.0 16 MobileSafari/8616.1.26.10.2 CFNetwork/1474 Darwin/23.0.0 18 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 21 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 22 DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 33) 29 MobileSafari/8615.2.9.10.4 CFNetwork/1408.0.4 Darwin/22.5.0 60 Safari/18615.2.9.11.10 CFNetwork/1408.0.4 Darwin/22.5.0 62 Safari/18615.3.12.11.2 CFNetwork/1410.0.3 Darwin/22.6.0 63 Go-http-client/2.0 96 com.apple.WebKit.Networking/8615.3.12.10.2 CFNetwork/1410.0.3 Darwin/22.6.0 112 MobileSafari/8615.3.12.10.2 CFNetwork/1410.0.3 Darwin/22.6.0 158 com.apple.WebKit.Networking/8615.2.9.10.6 CFNetwork/1408.0.4 Darwin/22.5.0 10972 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~feedly-nikon3) ```
rustrust commented 9 months ago

does h2spec not test for this...?

joelwurtz commented 1 month ago

FYI :

I have made 2 pull request in order to make it work in actix http

With both of this changes it works fine (so no change needed in actix http crate)