Open itrofimow opened 1 year ago
GenericDocument
's destructor is not recursive it delete
's the allocator and that's it. The default allocator MemoryPoolAllocator
's destructor is not recursive too. But GenericDocument
inherits GenericValue
whose destructor is recursive.
So I think your concern is GenericValue
's destructor but it's recursive only if Allocator::kNeedFree == true
which is not true with the default MemoryPoolAllocator
.
Any further info or tests?
Hi @aikawayataro!
Yes, sorry, you're right of course, this is all about GenericValue
destructor, i'll rename the issue accordingly,
Indeed, this is not an issue with the MemoryPoolAllocator
, however i have a decently large codebase with CrtAllocator
used all over the place (to allow mixing values created by different instances of the CrtAllocator
, to allow Document
-> Value
conversion https://github.com/Tencent/rapidjson/issues/387, etc. etc.), and i don't think it would be feasible for me to change that.
This is easily reproducible with the following code:
#include <string_view>
#include <rapidjson/document.h>
namespace impl {
::rapidjson::CrtAllocator g_allocator;
using UTF8 = ::rapidjson::UTF8<char>;
using Document = ::rapidjson::GenericDocument<UTF8, ::rapidjson::CrtAllocator,
::rapidjson::CrtAllocator>;
} // namespace impl
void Parse(std::string_view data) {
impl::Document json{&impl::g_allocator};
rapidjson::ParseResult ok =
json.Parse<rapidjson::kParseDefaultFlags |
rapidjson::kParseIterativeFlag |
rapidjson::kParseFullPrecisionFlag>(data.data(), data.size());
if (!ok) {
exit(1);
}
}
Now if i feed into Parse
data in the following format
{
"o": {
"o": {
"o": {
<nesting goes on>
}
}
},
"nonsense"
}
, which could be generated by the following code, for example:
#include <iostream>
#include <string>
std::string GenerateNestedJson(std::size_t depth) {
std::string result{};
result.reserve(depth * 8);
for (std::size_t i = 0; i < depth; ++i) {
result.append("{\"o\":");
}
result.append("{}");
for (std::size_t i = 0; i < depth; ++i) {
result.push_back('}');
if (i + 2 == depth) {
result.append(",\"nonsense\"");
}
}
return result;
}
int main() {
std::size_t depth; std::cin >> depth;
std::cout << GenerateNestedJson(depth) << std::endl;
return 0;
}
, the Parse
function is guaranteed to segfault with the depth big enough:
it takes depth of around 135000
(with the data being around 800Kb) with the default stack size of 8Mb,
and with the stack size of 256Kb (set by ulimit -s 256
, for example) it takes depth of ~4000
with the data being just 25Kb.
Okay, I understand your point.
The problem is that in order to implement a new flag for a value, the internal API have to be changed, including the GenericValue
constructor, which would add complexity for GenericDocument
.
Unfortunately RapidJSON doesn't have a template API to use user-provided Value
class which makes it difficult to add this kind of functionality in usercode.
If you have an idea how to implement this feature clean please share your thoughts.
I see the solution to your problem is to use GenericReader
to read your JSON, build the Document
manually using Handler
, and in case of an error perform your cleanup procedure.
void IterativeClean(Value &value);
int main() {
std::string_view json = "{}";
struct MyHandler : public BaseReaderHandler<UTF8<>, MyHandler> {
MyHandler(Document &doc) : doc_(doc) {
stack.push(&doc_);
}
bool Default() {
return true;
}
// Implement callbacks from BaseReaderHandler here.
std::stack<Value *> stack;
Document &doc_;
};
MemoryStream json_stream = MemoryStream(json.data(), json.size());
Reader reader;
Document doc;
MyHandler handler(doc);
auto res = reader.Parse<kParseIterativeFlag>(json_stream, handler);
if (res.IsError()) { // Parsing failed
IterativeClean(doc);
}
// ...
IterativeClean(doc);
return 0;
}
A simpler and faster code can also be achieved by using GenericDocument::Populate
, but in case of parsing error you have to "close" all incomplete nodes, such as key-values, objects, and arrays.
Yes, the problem with GenericDocument<>::Parse()
could be solved via SAX parsers, but this was merely an example, and the much broader problem is that a GenericValue
could be destroyed recursively somewhere, anywhere.
At userver we try really hard to wrap any native rapidjson::GenericValue
into some RAII type, which destroys the value iteratively, like this, but this gets very complicated very quickly:
rapidjson::GenericValue
is passed around extreme cautious is required to not accidentally let it be destroyed by its native destructorrapidjson::
is called we have to make sure it doesn't destroy the value (or its children) in any execution path. This invariant is basically impossible to maintainWe consider this a security threat (carefully crafted data sent by an attacker could potentially lead to a DoS) for us, and are willing to provide a patch (we vendor rapidjson and we'll likely patch it anyway). The easiest and the least intrusive way i can think of is something in the lines of
~GenericValue() {
#ifdef RAPIDJSON_ITERATIVE_VALUE_DESTRUCTOR
DestroyIteratively(...); // this has to be implemented
#else
... current implementation
#endif
}
We understand that iterative destruction would be measurably slower, but we are ok with that.
What do you think?
I think you should submit a pull request then.
To avoid misunderstandings, I am not the maintainer of this repository.
You need to define your destroyer in the internal
namespace, add the ability to enable it via option()
in CMake, and introduce a test if necessary.
Hi!
There is a
kParseIterativeFlag
, which prevents stack exhaustion when parsing deeply nested json data, however destruction of the parsed value is still recursive and might blow the stack up in case of extremely deeply nested objects or small stacks.We at https://github.com/userver-framework/userver use stackfull coroutines with the stacks way less than 8Mb, and we have a workaround for destroying
GenericDocument
iteratively, which is a bit ugly, but at least it works.Here comes the problem:
Given following code
and the json data in form
the "o" object would be destroyed in the
Parse
function itself, before we are able to wrap it into non-recursive destruction routine, and with depth big enough it would blow the stack up, which effectively defies the purpose ofkParseIterativeFlag
at all.One workaround for that could be a depth limit for
Parse
, however it would still leave some room for constructing jsons nested deeper than the aforementioned limit, so what i propose is a compile-time flag to destroyGenericDocument
iteratively, to avoid potentially dangerous recursion once and for all.