DaveGamble / cJSON

Ultralightweight JSON parser in ANSI C
MIT License
10.8k stars 3.21k forks source link

Stack overflow for circular reference #880

Open tregua87 opened 3 months ago

tregua87 commented 3 months ago

I noticed that cJSON does not correctly handle objects with circular references (commit 3249730). For instance, I can have 3 objects that points each other, e.g., A->B->C->A, the function cJSON_Duplicate enters in a infinite recursions. Here is a simple example:

#include <cjson/cJSON.h>

#include <stdlib.h>
#include <stdint.h>

int main(int argc, char** argv) {

        cJSON *o = cJSON_CreateArray();
        cJSON *a = cJSON_CreateArray();
        cJSON *b = cJSON_CreateArray();

        cJSON_AddItemToArray(o, a);
        cJSON_AddItemToArray(a, b);
        cJSON_AddItemToArray(b, o);

        cJSON *x = cJSON_Duplicate(o, 1);

        cJSON_Delete(o);
        cJSON_Delete(a);
        cJSON_Delete(b);
        cJSON_Delete(x);

        return 0;
}

The problem seems that cJSON_Duplicate has no way to know if the child has been already processed, line 2773 in my version:

/* Walk the ->next chain for the child. */
    child = item->child;
    while (child != NULL)
    {
        newchild = cJSON_Duplicate(child, true); /* Duplicate (with recurse) each item in the ->next chain *./
        if (!newchild)
        {
            goto fail;
        }

I would propose a fix but I am not sure how to operate. I see two possible solutions:

Can you hint me if you were already aware of this problem, and if you plan to fix it?

PeterAlfredLee commented 2 months ago

Thanks for your report. As I said in #882, cjson is designed to believe the input arguments are correct. It leave the validation of input arguments to the caller, which makes cjson has a relative good performance.

Can you hint me if you were already aware of this problem, and if you plan to fix it?

Yes, we are aware of circular references. We can handle this by iterating all items and check if there are circular references, but this is no doubt a time consuming operation. I do not like this.

Feel free to tell me if you have better ideas.

tregua87 commented 2 months ago

I suggest implementing a set with a tree-based structure, something like this (http://web.cs.wpi.edu/~cs2102/common/kathi-notes/set-impl-w-trees.html). With this implementation, you need a log(N) time to determine if a node has been already processed.

You prepare a set of visited nodes. Then, while walking through the duplication, you abort if a node is already traversed. You can easily model the JSON objects via their virtual address, which fit into a binary tree.

You do not need the tree to stay balanced IMO. In that case, we can use a black-red tree.

I see the problem of adding such a big structure tho. It is not even C++ where you can leverage std::set structures.