cabo / cbor-ruby

CBOR (RFC 7049) extension for Ruby
45 stars 12 forks source link

Some inputs hang Rails console #11

Closed singpolyma-shopify closed 5 years ago

singpolyma-shopify commented 5 years ago

This is possibly a denial-of-service bug:

CBOR.unpack("\x7b\x0a\x09\x73\x65\x63\x72\x65\x74")

Paste that into any Rails console with the latest version of this gem loaded, and you will get NoMemoryError-- you would hope this would be not so bad, but once that is triggered it seems Ruby may be in a weird state. For example, the Rails console will immediately freeze and consume 100% CPU and need to be force killed.

cabo commented 5 years ago

This is indeed trying to allocate a very big string. That should simply fail and things should go on from there. Can you say more about the environment (Ruby version, platform, etc.)?

Sent from mobile

On 16. Apr 2019, at 20:56, Stephen Paul Weber (Work) notifications@github.com wrote:

This is possibly a denial-of-service bug:

CBOR.unpack("\x7b\x0a\x09\x73\x65\x63\x72\x65\x74") Paste that into any Rails console with the latest version of this gem loaded, and you will get NoMemoryError-- you would hope this would be not so bad, but once that is triggered it seems Ruby may be in a weird state. For example, the Rails console will immediately freeze and consume 100% CPU and need to be force killed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

singpolyma-shopify commented 5 years ago
$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux-gnu]

gem 'rails', '~> 5.2.3'

$ cat /etc/issue
Debian GNU/Linux 9 \n \l
singpolyma-shopify commented 5 years ago

One solution could be to limit the size of an initial buffer allocation before trying to actually read the CBOR content:

diff --git a/ext/cbor/unpacker.c b/ext/cbor/unpacker.c
index c448265..9f77ce9 100644
--- a/ext/cbor/unpacker.c
+++ b/ext/cbor/unpacker.c
@@ -29,6 +29,8 @@
 #include "rmem.h"
 #include <math.h>               /* for ldexp */

+#define LARGEST_BUFFER_DEFAULT 100000000UL
+
 #if !defined(DISABLE_RMEM) && !defined(DISABLE_UNPACKER_STACK_RMEM) && \
         MSGPACK_UNPACKER_STACK_CAPACITY * MSGPACK_UNPACKER_STACK_SIZE <= MSGPACK_RMEM_PAGE_SIZE
 #define UNPACKER_STACK_RMEM
@@ -245,7 +247,7 @@ static int read_raw_body_cont(msgpack_unpacker_t* uk, int textflag)
     size_t length = uk->reading_raw_remaining;

     if(uk->reading_raw == Qnil) {
-        uk->reading_raw = rb_str_buf_new(length);
+        uk->reading_raw = rb_str_buf_new(length < LARGEST_BUFFER_DEFAULT ? length : LARGEST_BUFFER_DEFAULT);
     }

     do {
ptoomey3 commented 5 years ago

Just wanted to bump this and 👍the approach outlined above. I’d imagine the vast majority of usage is for folks not deserializing GB sized data structures. It seems tolerable to assume that and either pay that dynamic buffer resizing penalty or provide some opt-in system to let a caller specify the large object expectation.

cabo commented 5 years ago

Sorry for dropping the ball on this.

The underlying problem is that the Ruby interpreter does not simply reject the allocation, but also locks up in some way. I cannot fix that (but maybe we can do a detailed bug report.)

The question is whether we now have to defend every single memory allocation in the code. The one that is being patched here is not the only one that would require such a defense. A working memory allocation in core Ruby would be a much better defense.

cabo commented 5 years ago

There already is this bug report:

https://bugs.ruby-lang.org/issues/15779

I haven't rebuilt Ruby to test this fix.

cabo commented 5 years ago

(FYI, this triggers the same bug and is a quick way to test:

>> "a" * 10000000000000000
(34560,0x7fff96a37380) malloc: *** mach_vm_map(size=10000000000004096) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
(34560,0x7fff96a37380) malloc: *** mach_vm_map(size=10000000000004096) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
NoMemoryError (failed to allocate memory)
>>

No input is possible after this.)

ptoomey3 commented 5 years ago

Beyond the actual NoMemoryError, it seems less than ideal that, by default, random input can trigger allocation of huge amounts of memory. It is difficult for a caller to sanity check a CBOR input without parsing the CBOR input using a CBOR library. This is tricky to balance, since I guess these inputs are technically "valid input". But, as of now, the library is tricky to use for parsing user-provided CBOR as would exist in some common use cases (ex. the emerging WebAuthn standard requires parsing user-provided CBOR). I'm not at all familiar with the msgpack code., but do you know how they approach avoiding similar issues?

cabo commented 5 years ago

If you look closely, you will find that what you are looking at is the msgpack code... I haven't followed the recent commits there, though.

To properly solve the problem, we'd need a way to box a CBOR.unpack into a certain maximum amount of memory allocated. Given that the API currently requires you to supply the entire input CBOR data item, its size is probably a good proxy for that amount. To further simplify this, we could use that as a rough limit for each of the allocation calls inside the code. That sounds doable, but probably needs more than one afternoon to code and test.

ptoomey3 commented 5 years ago

If you look closely, you will find that what you are looking at is the msgpack code...

I knew this lib was roughly based off of msgpack, but wasn't sure how much reuse/overlap there was. In theory would msgpack be susceptible to a similar issue? I didn't see any similar reports for msgpack and hence assumed (potentially incorrectly :smile:) there was something inherent with their design/implementation that avoids this kind of thing.

ptoomey3 commented 5 years ago

The closest issue I found for msgpack was an issue with the python implementation here and was fixed here. They provided a sample input that would originally result in 32GB of memory being allocated for python:

"\xdd\xff\xff\xff\xff"

Trying to decode in Ruby results in a more or less instant failure during decode:

irb(main):008:0> MessagePack.unpack("\xdd\xff\xff\xff\xff")
EOFError (end of buffer reached)

I'm not clear if a better contrived input would force Ruby to allocate more memory successfully.

cabo commented 5 years ago

I think that is the equivalent of CBOR.unpack("\x9a\xff\xff\xff\xff") (a 2**32-1 size array), which yields the same, as does CBOR.unpack("\x7a\xff\xff\xff\xff") (a 2**32-1 size text string).

msgpack has a 4 GiB limit on strings, which is not something we wanted to replicate for a format defined in 2013.

Now, CBOR.unpack("\x7b\xff\xff\xff\xff\xff\xff\xff\xff") also yields EOFerror immedately, as apparently does any string with 64-bit length that has the most-significant bit set (signed vs. unsigned error somewhere?). The allocation problem seems to be triggered already with CBOR.unpack(("\x7b" + "\x01" * 8).b), which of course is already huge. CBOR.unpack(("\x7b" + "\x00" * 2 + "\x01" * 6).b) doesn't, go figure (I'm sure I don't have a Terabyte swap space free in my laptop).

cabo commented 5 years ago

Workaround in 0.5.9.4 for now. Please check.

ptoomey3 commented 5 years ago

Workaround in 0.5.9.4 for now. Please check.

Glancing..does that still allow pre-allocation up to a terabyte?

cabo commented 5 years ago

On Jun 21, 2019, at 19:59, Patrick Toomey notifications@github.com wrote:

Workaround in 0.5.9.4 for now. Please check.

Glancing..does that still allow pre-allocation up to a terabyte?

Yes. Is that a problem on your platform?

Grüße, Carsten

ptoomey3 commented 5 years ago

Yes. Is that a problem on your platform?

I've not had a chance to try it out yet. But, in a world where we are processing user controlled CBOR (like WebAuthn), it sounds like an opportunity for exhausting memory on servers. We can possibly tolerate that so long as it doesn't lead to unstable ruby processes. I'll have a better sense of things after testing this change. Thanks!

ptoomey3 commented 5 years ago

This does seem to address the issues we were seeing. I can't pretend I totally undersand how, given it still let's 1TB pre-allocations :smile:. But, in practice, I'm not seeing the same unstable behavior when randomly fuzzing.

cabo commented 5 years ago

Good. So I'll close this issue for now; if you see any more instabilities, please reopen.

(I'm not very worried about the fact that we don't fully understand the bug that we are working around; we can only observe the bug empirically right now and therefore the fix can be empirical, too. In the long run, the bug will be fixed, as well.)

We can certainly discuss how generous the code should be with pre-allocations. Also, we should be looking for other places where there might be allocations. On the surface, there are only four places where the CBOR data item supplies a size; we fixed (text and byte) strings and arrays, and there simply is no way to pre-allocated hashes anyway. But a code audit that is not...

cabo commented 5 years ago

Ah, it's getting interesting:

https://travis-ci.org/cabo/cbor-ruby

So Ruby 2.0 to 2.2 do not like pre-allocating even a couple of Gibibytes... That became possible only from 2.3 onwards.

(But that is just a failing test that assumed there would be an EOFError, no disabling the application.)

cabo commented 5 years ago

I toned this down to 250 MiB in in 0.5.9.5, so older Rubies don't get misleading error messages. Should be enough for the overwhelming majority of cases, anyway...