aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.13k stars 2.02k forks source link

[security] server should not trust client-provided content-length #7208

Closed muayyad-alsadi closed 1 year ago

muayyad-alsadi commented 1 year ago

Describe the bug

a server should not trust client-provided content-length

in a typical pool-based servers, DoS will cause all threads/processes in the pool to be busy, preventing server from accepting more connections in anon-blocking asyncio server like yours, the service would still be able to accept connections but the attack will case the server to go out of memory each request with a 100 MB content, those contents will never be freed because it's still waiting for the never coming bytes.

even if it it uses smaller buffer size, DoS attack will consumer all open files ...etc.

To Reproduce

server.py

# server.py

from aiohttp import web
routes = web.RouteTableDef()

@routes.put('/upload')
async def put_uploads(request: web.BaseRequest):
    tmp_filename = "/tmp/delme.tmp"
    if request.content_type=='multipart/form-data':
        multipart = await request.multipart()
        async for part in multipart:
            if not part.filename: continue
            with open(tmp_filename, 'wb+') as f:
                async for chunk in part:
                    f.write(chunk)
    else:
        with open(tmp_filename, 'wb+') as f:
            async for chunk in request.content.iter_any():
                f.write(chunk)

app = web.Application()
app.add_routes(routes)

def main():
    web.run_app(app, port=3000)

if __name__ == '__main__':
    main()

create a large file

any large file will work, a 100 MB will work

$ dd if=/dev/zero of=static/zero.bin bs=1MB count=100
$ ls -lh static/zero.bin
-rw-r--r--. 1 alsadi alsadi 96M Feb  1 21:05 static/zero.bin

client

claim to have 120MB content-length causing server to keep 100MB in RAM and block forever

$ curl -X PUT -H 'content-length: 120000000' -F file=@./zero.bin 'localhost:3000/upload'  &

similar cases to reproduce the bug but does not cause DoS

$ echo "123" > file.txt
$ curl -X PUT -H 'content-length: 10000' -F file=@./file.txt 'localhost:3000/upload' 
$ curl -X PUT -H 'content-length: 8' --data-binary @./file.txt 'localhost:3000/upload'  &

Expected behavior

Logs/tracebacks

shorter content-length gives

Error handling request
Traceback (most recent call last):
  File "/usr/lib64/python3.10/site-packages/aiohttp/web_protocol.py", line 332, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 551, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'Invalid method encountered'"

### Python Version

```console
$ python --version
Python 3.10.9

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.8.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /usr/lib64/python3.10/site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl
Required-by: aiobotocore, aiohttp-asgi, aiohttp-cors, aiohttp-socks, geoip2

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 5.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/lib64/python3.10/site-packages
Requires: 
Required-by: aiohttp, yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.7.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/lib64/python3.10/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Fedora 36

$ uname -r 6.1.7-100.fc36.x86_64

Related component

Server

Additional context

No response

Code of Conduct

Dreamsorcerer commented 1 year ago

I think you should be using iter_chunked(2**20) or similar. That's the cause of your memory issue.

As for the server detecting the wrong content-size, I'm not too familiar with the implementation details, but how would you tell the difference between a client sending the wrong size and a client that is part way through sending a large request? Once the client disconnects, we'll abort. I don't think there's much more we can do, or would be expected to do.

muayyad-alsadi commented 1 year ago

I think you should be using iter_chunked(2**20) or similar. That's the cause of your memory issue.

I've addressed this

even if it it uses smaller buffer size, DoS attack will consumer all open files ...etc.

I have even demonstrated 8 byte attack

$ echo "123" > file.txt
$ curl -X PUT -H 'content-length: 8' --data-binary @./file.txt 'localhost:3000/upload' 

in this case the socket and tmp_file will be open forever because it will be waiting for the 8 bytes to come which will never happen because the file size is 4 bytes

an evil client with such requests will deny the service because the server can't open a new socket for other legit client (socket is a file)

muayyad-alsadi commented 1 year ago

the problem here is not that the request allocated 100MB at once, the problem is that they will never be freed because they will be freed when the request end which will be when the promised 120MB content-length arrive

Dreamsorcerer commented 1 year ago

in this case the socket and tmp_file will be open forever

Until the connection is dropped.

an evil client with such requests will deny the service because the server can't open a new socket for other legit client (socket is a file)

That is an issue with all servers, as far as I'm aware, and not really related to content-size (what if a client sends the correct content-size, but then stops half way forever?). Standard DoS/DDoS prevention methods should be used, including increasing the number of open connections the OS supports, limiting the number of connections per IP, using a service like Cloudflare etc.

I don't see anything we are expected to do here.

muayyad-alsadi commented 1 year ago

in this case the socket and tmp_file will be open forever

Until the connection is dropped.

which is enough time to reach max open file and deny the service

That is an issue with all servers, as far as I'm aware, and not really related to content-size (what if a client sends the correct content-size, but then stops half way forever?).

no, it's properly handled in many libraries and frameworks server can detect that connection is closed, multipart boundary is reached,...etc.

client should never be trusted, malicious/misbehaving clients should be properly handled for both correctness and security

ex. uploading a large file while entering an elevator or a car enters blow a bridge, if the size mismatched, an error should be raised

in php, it's UPLOAD_ERR_PARTIAL

https://www.php.net/manual/en/features.file-upload.errors.php

python bottle handles this well too,

in aiohttp there seems to be checks for EOF but it does not seem to work in all cases.

Standard DoS/DDoS prevention

floods are one kind of DoS, not the only kind.

DoS vulnerability is a shortcut the consumes server resources without even the need to flood the server

for example express CVE-2022-24999 can be triggered by

curl localhost:3000/somepage?page[_proto__]&page[_proto__]&page[length]=100000000

no need for flood

Dreamsorcerer commented 1 year ago

no, it's properly handled in many libraries and frameworks server can detect that connection is closed, multipart boundary is reached,...etc.

As mentioned above, if the connection is closed it should be aborted. Can you retest with 3.8.4, there was a known issue when using SSL.

If it's still not aborting on 3.8.4, then figuring out why would be useful.

muayyad-alsadi commented 1 year ago

no, I'm not using SSL

I'm facing something strange I was playing with nmap's ncat and bsd's netcat with multipart and I want to see how it behaves

cat <<EOF | ncat localhost 3000
PUT /upload HTTP/1.1
Host: localhost:3000
User-Agent: curl/7.82.0
Accept: */*
content-length: 2
Content-Type: multipart/form-data; boundary=------------------------839d48307be13794

--------------------------839d48307be13794
Content-Disposition: form-data; name="file"; filename="delme.txt"
Content-Type: text/plain

1234

--------------------------839d48307be13794--
EOF

the nmap ncat did not respond at all, even with simpler request

@routes.get('/ping')
async def ping(request):
    t=int(time.time())
    return web.Response(text=f"pong@{t}")

while python bottle does respond

#! /usr/bin/python3
import time
import logging
from bottle import (
    Bottle,
    request,
    run,
    BaseRequest,
    debug,
)

logger = logging.getLogger(__name__)

application = app = Bottle()
BaseRequest.MEMFILE_MAX = 15 * 1024 * 1024  # allow 15 MB

@app.get("/ping")
def ping():
    return "pong@" + str(time.time())

def main(port=4000):
    run(app, host="localhost", port=port)

if __name__ == '__main__':
    main()
echo -en "GET /ping HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" | ncat localhost 4000
echo -en "GET /ping HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" | netcat localhost 4000

one respond and the other does not respond, only in aiohttp server the wsgiref of bottle respond in both

muayyad-alsadi commented 1 year ago

nmap's ncat not getting any response is a mystery to be solved aiohttp is the only server I'm awair of that did not respond to craftted ncat request

mean while here are what bsd's netcat gives me

$ cat <<EOF | netcat localhost 3000
PUT /upload HTTP/1.1
Host: localhost:3000
User-Agent: curl/7.82.0
Accept: */*
content-length: 2   
Content-Type: multipart/form-data; boundary=------------------------839d48307be13794

--------------------------839d48307be13794
Content-Disposition: form-data; name="file"; filename="delme.txt"
Content-Type: text/plain

1234

--------------------------839d48307be13794--
EOF

the output was

HTTP/1.0 400 Bad Request
Content-Type: text/plain; charset=utf-8
Content-Length: 44
Date: Sun, 12 Feb 2023 19:39:36 GMT
Server: Python/3.10 aiohttp/3.8.3

Bad status line 'Invalid method encountered'

and the trace on the server

Error handling request
Traceback (most recent call last):
  File "/usr/lib64/python3.10/site-packages/aiohttp/web_protocol.py", line 332, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 551, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'Invalid method encountered'"

good news it raised exception, although it's not what I expected.

changing content-length to 1000 never return although I think netcat sent and flushed and closed the entire payload.

Dreamsorcerer commented 1 year ago

I believe that exception pretty much means the llhttp parser failed to parse the headers. This may be because we have strict mode on (https://github.com/aio-libs/aiohttp/issues/5269#issuecomment-1215686989), and we'd get something different if that were changed to lenient parsing.

muayyad-alsadi commented 1 year ago

I'm not complaining about the exception, I want more of it. I want a similar exception in the other case. I believe we should prevent it from blocking forever when I send a larger content-length

$ cat <<EOF | netcat localhost 3000
PUT /upload HTTP/1.1
Host: localhost:3000
User-Agent: curl/7.82.0
Accept: */*
content-length: 1000
Content-Type: multipart/form-data; boundary=------------------------839d48307be13794

--------------------------839d48307be13794
Content-Disposition: form-data; name="file"; filename="delme.txt"
Content-Type: text/plain

1234

--------------------------839d48307be13794--
EOF

this should not block forever, I believe malformed request should be detected when boundary is reached or when input is closed or EOF is reached ..etc.

for some reason it does not detect EOF

muayyad-alsadi commented 1 year ago

btw I suspect that nmap's ncat client is broken for a similar reason which is not detecting the close of input

echo -en "GET /ping HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" | ncat localhost 4000
Dreamsorcerer commented 1 year ago

I'm not complaining about the exception, I want more of it. I want a similar exception in the other case. I believe we should prevent it from blocking forever when I send a larger content-length

Yep, but means it'll need retesting when that parser is changed, and should hopefully produce a more understandable exception.

for some reason it does not detect EOF

Yep, that doesn't seem ideal. But, the connection is still open, right? It does abort when the connection is closed?

In terms of security, as long as it aborts when the connection is closed, it's not going to get much safer. i.e. If the connection is actually from an attacker, they just won't send EOF or anything that could be used to indicate they won't send any more data, they'll just stop mid-request and hold the connection open.

So, if I'm understanding that correctly, this is more of an optimisation for badly implemented clients, rather than a security concern with malicious clients. Still worth looking at if anybody has the time though.

muayyad-alsadi commented 1 year ago

So, if I'm understanding that correctly, this is more of an optimisation for badly implemented clients, rather than a security concern with malicious clients. Still worth looking at if anybody has the time though.

no, I still see it as a security risk I believe I can take down any aiohttp server even if put behind any WAF of your choice, and I can still deny the service preventing any other client from connecting. by consuming all openfiles or by causing out of memory.

what need to be changed? handle the two cases, reaching EOF before content-length or reaching content-length before EOF

maybe this is reported by

and maybe it's ignored somewhere

and something need to be made to make sure nmap's ncat work as all other web servers.

https://man7.org/linux/man-pages/man2/select.2.html https://man7.org/linux/man-pages/man2/poll.2.html

Dreamsorcerer commented 1 year ago

no, I still see it as a security risk I believe I can take down any aiohttp server even if put behind any WAF of your choice, and I can still deny the service preventing any other client from connecting. by consuming all openfiles or by causing out of memory.

what need to be changed? handle the two cases, reaching EOF before content-length or reaching content-length before EOF

As I keep saying, an attacker can just send stop sending data without sending EOF. So, if you you can DoS a service now, you'll still be able to DoS a service after making these changes...

Unless I am misunderstanding something, resolving this issue doesn't change the feasibility of a deliberate DoS attack.

https://man7.org/linux/man-pages/man2/select.2.html https://man7.org/linux/man-pages/man2/poll.2.html

Interesting, I'm not famililar with the low level details of sockets in asyncio, so I'm not sure if that's something that is being handled or not. Although the description of exceptional events doesn't sound like anything to do with EOF (it sounds like it's only dealing with TCP errors, while EOF would be handled at a higher level).

scott-8 commented 1 year ago

We had a security scan flag aiohttp for this issue, so it would be good to have this resolved.

what need to be changed? handle the two cases, reaching EOF before content-length or reaching content-length before EOF

In doing this, you're trusting the content instead of the content-length, which get us back where we started. Here's a document detailing what uvicorn does with content-length. I think it's a good reference. Basically, we can trust content length but use it to validate the request body.

Looking at these scenarios:

Is this in line with how aiohttp behaves right now, or what would need to be added?

Dreamsorcerer commented 1 year ago

Sounds like it should be correct. I'm not entirely sure what we do with the Content-Length, so if someone would like to create some tests (or check if any already exist), that'd be useful.

scott-8 commented 1 year ago

If you're curious, here's a Cloudflare article describing an attack like this.

https://www.cloudflare.com/learning/ddos/ddos-attack-tools/r-u-dead-yet-rudy/

Dreamsorcerer commented 1 year ago

Yes, but again, there's not much aiohttp can do here. The recommendation is to set reasonable timeouts, which should probably be done in your proxy server (along with WAF and other such protections).

If aiohttp doesn't throw an exception in the above 2 mentioned cases, we can make that change, but it really has no impact on a DoS attack. Beyond that, I still see nothing to do here.

neilpvirtualitics commented 1 year ago

FWIW: if you are using aiohttp; because it is a dependency of s3fs, and you're getting dinged with a PRISMA-2023-0024 finding, you're probably safe. In this use-case, it's only making a 'get' request to an s3 file system (AWS S3 or Minio, etc). This issue only works when an outside client is making a 'put' against aiohttp.

HarryCaveMan commented 4 months ago

If you're curious, here's a Cloudflare article describing an attack like this.

https://www.cloudflare.com/learning/ddos/ddos-attack-tools/r-u-dead-yet-rudy/

Literally any webserver can be DOS'd with RUDY if it is misconfigured (IE TTL too high or not rejecting requests with unnecessarily large content-length header). Both of these are configurable in aiohttp.

Heck, even if you just have a lot of legitimate traffic come in at the same time from a bunch of users running on 56k modems sending large multipart payloads, this could happen on any http webserver. The fact is, content length headers are not security and should not be thought of as such. Prisma needs to get it together, they are costing their customers collectively massive amounts of time and money mitigating non-issues by continuing to be the outlier on this and singling out aiohttp on an issue that affects all webservers because of an incomplete test run by one person. Not to mention.... That person STILL has not answered the very important question "did you close the connection or leave it open?", while PRISMA-2023-0024 explicitly states "as long as the connection is not dropped by the user". The fact that they included in their statement that the connection must be left open further validates the fact that this is not unique to aiohttp.

sg1chb commented 4 months ago

@HarryCaveMan Is is possible for you to provide a link detailing the reason(the dispute) PRISMA-2023-0024 for aiohttp was yanked from the NVD? Is it documented somewhere?

HarryCaveMan commented 4 months ago

@HarryCaveMan Is is possible for you to provide a link detailing the reason(the dispute) PRISMA-2023-0024 for aiohttp was yanked from the NVD? Is it documented somewhere?

Apologies.... It was never reported by OWASP (Dependency Track) or any other container scanner other than Twistlock (IE Synk or AWS ECR), and it was never even in the NVD because no CNA will own it either. This is probably due to the fact that ANY webserver that accepts large multipart payloads and has a high/long TTL could be DOS'd from an RUDY attack, so there is no real use in singling out aiohttp based on a single test with incomplete info performed by one person... I was thinking of a different CVE that was disputed. Not sure why Prisma continues to be the singular outlier on this issue though...

louislung commented 6 days ago

PRISMA-2023-0024 bring me here, but it seems this issue is not valid anymore?

Dreamsorcerer commented 6 days ago

It was never valid, see the rant above: https://github.com/aio-libs/aiohttp/issues/7208#issuecomment-2184068504