Closed padolph closed 3 years ago
This is tricky just because 2 gigs (max signed int) is not necessarily invalid value; and unfortunately it is not really possible to reliably know how much content there will be with streaming input. I guess for 2.11 we could add configurable maximum byte chunk size. But I am just wondering what would be real-life typical maximum sizes: certainly content in hundreds of megs are not that unusual.
Agreed. I think a configurable parameter is likely the only general reasonable approach.
My test data encountered this when parsing an object field name, which then called into the code I posted. Perhaps in the specific instance of field name parsing we can cap to a smallish length?
BTW, sorry I got mixed up with the "stack allocation" part, this data is clearly on the heap. I edited my submission to remove the confusion.
Ah. Yes, maximum length for names seems perfectly reasonable. And I am not against configurable limit for values, but that value may need to start at quite high (perhaps even max int).
Eventually similar approach should probably be added for JSON (and Smile, maybe Protobuf, Avro) decoders, with various limits. I have done something like this with XML before (Woodstox Stax parser has a reasonable set of limits for all kinds of things like max attributes, nesting levels, attribute values, cdata segments etc etc), but so far there hasn't been as much demand for json for some reason. Maybe it's just that XML had more time to mature to be exposed to DoS attacks or something. :)
Another possibility would be not pre-allocating the buffer for large sizes and instead using a ByteArrayOutputStream-like growing scheme. It's slightly less efficient if the full data does come in, but an attacker would have to actually send the data she is claiming to cause a DoS, which makes an attack more difficult.
@yawkat Problem is mostly the API: dynamically growing scheme won't help if the whole data must reside in memory.
But there is already method
public int readBinaryValue(OutputStream out) throws IOException { }
which can be used for incremental, non-repeatable reads, and it is implemented without full read for CBOR backend. The challenge is, however, that this is not used by databinding (I couldn't think of obvious way to bind such data to standard JDK types) so custom deserializer or handling is needed.
What I mean is that with this problem, an attacker can use a very small payload to cause this big allocation, possibly causing denial of service with little effort. If the buffer were to grow incrementally while reading, the attacker would need to follow up the length header they sent with actual data to make an impact on memory consumption.
@yawkat Ah! Yes, a very good point. It is good to at least make attacker pay for the cost.
There is also a smaller additional benefit that for legit incremental use, not having to allocate a single huge byte array can be beneficial wrt JVM memory space (I think there's threshold above which big buffers bypass regular allocation system and can become problematic on their own).
I think I have a good idea of what to do here: keep "small" (say, 100kB or below) handling as-is, but for larger content just use ByteArrayBuilder
similar to how "chunked" reads are handled. This would ensure that memory usage would be proportional to payload included.
Actual content size limits would be the next step; and possibly allow databind-level limits too. But first things first.
Ah, didn't know about ByteArrayBuilder
, that looks like a sensible approach. I can build a patch
Actually, with BAB this can be simplified a lot. BAB already limits the first chunk size to 128KiB, which seems reasonable. This patch makes use of that:
Index: cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java b/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java
--- a/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java (revision d8c902ef1d1aec11b60821cf35e3c48516c3222e)
+++ b/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java (date 1606990356583)
@@ -2432,6 +2432,20 @@
// either way, got it now
return _inputBuffer[_inputPtr++];
}
+
+ private void readSomeBytes(ByteArrayBuilder bb, int len) throws IOException {
+ while (len > 0) {
+ int avail = _inputEnd - _inputPtr;
+ if (_inputPtr >= _inputEnd) {
+ loadMoreGuaranteed();
+ avail = _inputEnd - _inputPtr;
+ }
+ int count = Math.min(avail, len);
+ bb.write(_inputBuffer, _inputPtr, count);
+ _inputPtr += count;
+ len -= count;
+ }
+ }
@SuppressWarnings("resource")
protected byte[] _finishBytes(int len) throws IOException
@@ -2441,22 +2455,10 @@
if (len == 0) {
return NO_BYTES;
}
- byte[] b = new byte[len];
- if (_inputPtr >= _inputEnd) {
- loadMoreGuaranteed();
- }
- int ptr = 0;
- while (true) {
- int toAdd = Math.min(len, _inputEnd - _inputPtr);
- System.arraycopy(_inputBuffer, _inputPtr, b, ptr, toAdd);
- _inputPtr += toAdd;
- ptr += toAdd;
- len -= toAdd;
- if (len <= 0) {
- return b;
- }
- loadMoreGuaranteed();
- }
+ // len may be truncated by ByteArrayBuilder
+ ByteArrayBuilder bb = new ByteArrayBuilder(len);
+ readSomeBytes(bb, len);
+ return bb.toByteArray();
}
// or, if not, chunked...
@@ -2479,17 +2481,7 @@
if (len < 0) {
throw _constructError("Illegal chunked-length indicator within chunked-length value (type "+CBORConstants.MAJOR_TYPE_BYTES+")");
}
- while (len > 0) {
- int avail = _inputEnd - _inputPtr;
- if (_inputPtr >= _inputEnd) {
- loadMoreGuaranteed();
- avail = _inputEnd - _inputPtr;
- }
- int count = Math.min(avail, len);
- bb.write(_inputBuffer, _inputPtr, count);
- _inputPtr += count;
- len -= count;
- }
+ readSomeBytes(bb, len);
}
return bb.toByteArray();
}
This seems like a reasonable solution, but not implementing a "short buffer" code path has two disadvantages:
The first doesn't sound like a big issue, and the second sounds like it could be fixed in BAB instead.
I can use this, hope to get merged soon.
Implemented chunked reading for "big enough" cases (200k or above), to remove possibility of excessive memory usage for tiny documents -- memory only allocated relative to actual content decoded. This will prevent possibility of, say, 6 byte document trying to allocate 2 gigs for decoding.
Question of processing limits for legit content (that is, actual big document) is separate from this; it is possible to limit reads (using JsonParser.readBinaryValue()
instead of getBinaryValue()
) already, but not for databinding (mapping content to byte[]
).
It should be possibly to define limits at that level too, but requires different kind of design.
Will mark this issue as closed; follow-ups may be filed as necessary.
Will be included in:
The above mentioned versions have been released. Is this issue closed?
@cyrilc-pro yes, thanks for pointing this out.
CVE-2020-28491 was assigned for this
Thank you @atoptsoglou
CBORParser.java _finishBytes() accepts an unchecked field string length value discovered during parsing, and is used to allocated a buffer. A malicious payload can be fabricated to exploit this and (at least) cause a java.lang.OutOfMemoryError exception.
I am not sure how serious this is in java. With an unmanaged runtime this would be critical security vulnerability.
For example, the following CBOR data (discovered by a fuzzer) leads to len = 2147483647 and triggers this exception on my laptop.
d9d9f7a35a7ffffffff7d9f7f759f7f7f7
This can probably be addressed by simple sanity checking of the len value (non-negative, some max limit).