Closed juceda closed 1 year ago
thanks, ill look into this. I should either fail via error, or complete successfully not consume resources like that. https://jsonlink.godbolt.org/z/vfPrbhz9r
I have found the issue in the unicode iterator I use, when hitting these invalid CP's it will increment 0 bytes into the raw stream and that is what is causing the resource exhaustion.
I would like the option of encoding it as a \u.... but also calling the error handler as config option as normally this is unwanted.
I figured things were going that way. But I am not clear if encoding bytes like \u00FE as is, it cannot raise another problem later.
Eg: "\xFE0123" (FE byte, ascii '0', ascii '1', ascii '2', ascii '3') Encoded as \u00FE0123, someone could take back for parsing as 32 bit long 00FE0123... since {} delimiters are not available on all languages. Crazy thing.
I'm refresing my knowledge about unicode character sequences and representations...
FYI, I just tested json_encode on PHP and failed to encode "\x0D\x0ATEST\xFETEST\x0D\x0A", returning the following error: Malformed UTF-8 characters, possibly incorrectly encoded.
So if some bytes are not expected, or might produce undefined behavior, seems to be fine to raise an error.
For my particular case, right now I have it patched as I mentioned, but I may have to tell clients that if they keep sending bytes that are not printable, instead of a string we will have to use a byte/number array to avoid problems (or B64 string).
Thank you Darrell.
I am going to support it, there are options like this already for compatability. The default will def be an error though.
Got a fix for this in the works https://github.com/beached/daw_json_link/pull/391 The default is going to be an error, but there is an option to escape when encountered too that gives the expected output.
https://github.com/beached/daw_json_link/releases/tag/v3.19.0 Has the fixes. Again, thanks for reporting.
Recently I have had a serious problem using the daw json library. I manage strings that can have some bytes inside, not 7bit or printable string.
I have observed that depending on the sequence of bytes inside the string, the to_json function gets blocked, the program begins to consume a lot of memory until finally the linux kernel kills it due to lack of resources (OOM).
I've made an quick example to reproduce it:
I expected request_json_payload to be {"in":"\r\nTEST\u00FETEST\r\n"} but instead of getting that result, the function to_json "hangs", eating memory quickly, and normally the program crashes killed by the kernel as I said before.
In that case, the problem is specifically on 0xFE byte. In fact, if we change test_string to simply ("\xFE") the same thing happens to the serializer.
I've tried to change the member to be json_string_raw, but it's not an option since the other side needs the bytes to be encoded. I tried to investigate and see there is some ?hidden? options like EightBitMode and Encode. Don't know if playing with this, I will be able to bypass this issue.
For now, I've made a quick&dirty workaround, by making a simple !isprint() -> \uXXXX encoding and passing it as a raw string to daw, but I'm interested in knowing what the problem is and if it could be a bug, I hope this report will be useful for you.
Thanks in advance!