DaveGamble / cJSON

Ultralightweight JSON parser in ANSI C
MIT License
10.8k stars 3.21k forks source link

Feedback wanted: Make parsing and printing configurable without adding new functions for every option #177

Open FSMaxB opened 7 years ago

FSMaxB commented 7 years ago

UPDATE: Current plans are described here

In https://github.com/DaveGamble/cJSON/issues/175#issuecomment-304811029 I was discussing the need for a way to configure cJSON that doesn't require changing or extending the API when a new option is added and doesn't break the ABI either.

This is just an idea and I would like to get some feedback on this.

Since cJSON is a JSON library, using JSON for configuration seems sensible, not as string but rather as tree of cJSON structs.

Traversing the tree of items every time a configuration option is accessed would be a big overhead, so I think there should be a function like cJSON_CreateConfiguration or similar that takes a tree of cJSON structs and returns a dynamically allocated struct that can then be passed to cJSONs functions.

Like the following:

cJSON *configuration_json = cJSON_CreateObject();
cJSON_AddItemToObject(configuration_json, "format", cJSON_CreateBool(0));

/* hooks are of type cJSON_Hooks */
cJSON_Configuration *configuration = cJSON_CreateConfiguration(configuration_json, hooks);

/* this prints the configuration_json without formatting */
char *printed_configuration = cJSON_PrintWithConfiguration(configuration_json, configuration); 

cJSON_Configuration would just be a typedef of void*. The internal representation can be changed from version to version without breaking compatibility.

What do you think?

ajaybhargav commented 7 years ago

I have gone though suggestion and I thought its better to take conversation from #175 to here.

I just thought of a better idea. Why not attach options to JSON node itself? a simple function e.g. cJSON_SetOpts(cJSON *, void *opts); and opts can be based on node type a union of configuration settings. we can add a new member in cJSON structure as void *opt which can be assigned to respective option structure according to type(e.g. string/number etc) or to simplify we can keep a common structure say struct cJSON_Opts

Advantage of attaching opt to node is that it can be referenced anytime in the whole source and it will be completely thread safe. Moreover this can be implemented right away in existing 1.x source and carried later to 2.x

what you say?

FSMaxB commented 7 years ago

I'm not sure if I understand correctly. You mean adding a configuration object to every single node? In that case your cJSON_SetOpts function would have to add the options to every single child in the tree recursively, otherwise it wouldn't work with the way that cJSON prints the tree recursively.

This also won't work for cJSON_Parse and other functions.

I think the actual configuration should be passed in to every function call manually and should be intentionally left as void* and it's internals considered unstable across versions.

The use case is to create a few global configuration objects at the beginning and later passing them in as needed (or creating them on the fly, but this would make performance worse.

This is still completely thread safe and allows extensions in the future because the actual configuration is done via JSON before converting it to this void* object with cJSON_CreateConfiguration.

The configuration objects can be stored in a global variable and reused over and over after they have been created once. Passing NULL would use the default settings.

FSMaxB commented 7 years ago

This is something that could be done for v1 already btw.

ajaybhargav commented 7 years ago

Let me make it more clear... with example

/* modified structure */
typedef struct cJSON
{
    /* next/prev allow you to walk array/object chains. Alternatively, use GetArraySize/GetArrayItem/GetObjectItem */
    struct cJSON *next;
    struct cJSON *prev;
    /* An array or object item will have a child pointer pointing to a chain of the items in the array/object. */
    struct cJSON *child;

    /* The type of the item, as above. */
    int type;

    /* The item's string, if type==cJSON_String  and type == cJSON_Raw */
    char *valuestring;
    /* writing to valueint is DEPRECATED, use cJSON_SetNumberValue instead */
    int valueint;
    /* The item's number, if type==cJSON_Number */
    double valuedouble;

    /* The item's name string, if this item is the child of, or is in the list of subitems of an object. */
    char *string;

    /* Options for this node (optional) */
    void *opt;
} cJSON;

/* This is just dummy for explanation */

struct cJSON_NumOpts {
    int decimal_places;
    char *format; /* optional */
};

/* set opts */
void cJSON_SetOpts(cJSON *node, void *opts)
{
    /* Check node type, maybe a switch case? */
    if (node->type & cJSON_Number) {
        struct cJSON_NumOpts *numopt = opts;

        node->opt = global_hooks.allocate(sizeof(struct cJSON_NumOpts));
        memcpy(node->opt, opts, sizeof(struct cJSON_NumOpts));
    }
}

/* Lets create some JSON */
root = cJSON_CreateObject();
cJSON_AddItemToObject(root, "name", cJSON_CreateString("My Device"));
child = cJSON_CreateObject()
cJSON_AddItemToObject(root, "location", child);
cJSON *node = cJSON_CreateNumber(11.123456);

/* I want number to be printed with 6 decimal places */
struct cJSON_NumOpts opts = {
    .decimal_places = 6,
    .format = NULL,
}
/* set option */
cJSON_SetOpts(node, &opts);
cJSON_AddItemToObject(child, "lat", node);

node = cJSON_CreateNumber(22.123456);
/* set same option */
cJSON_SetOpts(node, &opts);
cJSON_AddItemToObject(child, "long", node);

child = cJSON_CreateNumber(22.56);
/* set only 2 decimal places for temperature */
opts.decimal_places = 2;
/* set option */
cJSON_SetOpts(child, &opts);
cJSON_AddItemToObject(root, "temperature", child);

/* add another number with no opts - regular calls... */
child = cJSON_CreateNumber(22.12312);
cJSON_AddItemToObject(root, "number", child);

Now when doing print...

/* inside function print_number */

/* check if opts are set */
if (item->opts) {
    char format[10];
    /* create a format */
    sprintf(format, "%%1.%dg", item->opts->decimal_places);
    /* Then convert it to string */
    sprintf((char*)number_buffer, format, d);
} else {
    /* do regular convertion */
}

While parsing we dont have to do anything special, opt can be kept as NULL. if someone whats to set specific option. The above is just an example but can be improved a lot and optimized later on. This will not break anything and add great functionality to the library.

More configurations can be added by extending option structures for individual cJSON_Type.

Forgot to add:

/* while deleting item */
/* check if opts are there */
if (item->opts)
    global_hooks.deallocate(item->opts);
FSMaxB commented 7 years ago

Adding the option to the node itself bloats the cJSON struct by 4 or 8 bytes depending on the processor architecture, maybe even more because of padding. And that even though not many people would even use it.

Furthermore this won't solve things like formatted vs. unformatted printing, unless you attach a cJSON_ArrayOpts and cJSON_ObjectOpts to every single array and object in the tree. What I want though is a general purpose way of doing configuration.

Per-Item settings as in your example would be a bit inconvenient to do when passing in the configuration on a function call level but it's still possible with cJSON_Raw by printing the nodes that are special into a new node of cJSON_Raw while specifying the configuration object. This cJSON_Raw item can then replace the original number node for printing of the entire JSON.

Regarding #175: You can even do this already, by printing the number into valuestring with your own printf and format specifier and then changing the type to cJSON_Raw.

ajaybhargav commented 7 years ago

The potential problem I see with generalized configuration is, what if I want configuration to be done for specific node only. Can you share what could be the possible configurations that can be generalized? So it will give a clear idea what we are thinking to do and best way to implement it.

Regarding #175: You can even do this already, by printing the number into valuestring with your own printf and format specifier and then changing the type to cJSON_Raw.

I am already doing that for now and it is fine till I am not dealing with arrays, So i felt its more convenient If there is an option to configure display format.

FSMaxB commented 7 years ago

The potential problem I see with generalized configuration is, what if I want configuration to be done for specific node only. Can you share what could be the possible configurations that can be generalized? So it will give a clear idea what we are thinking to do and best way to implement it.

I can't think of any way other than converting a subset of a tree to cJSON_Raw.

A possible configuration object could look like this:

{
    "buffer": 4096,
    "format": true,
    "case_sensitive": false,
    "allow_data_after_json": false
}

allow_data_after_json would do the opposite of require_null_terminated.

Regarding #175: You can even do this already, by printing the number into valuestring with your own printf and format specifier and then changing the type to cJSON_Raw.

I am already doing that for now and it is fine till I am not dealing with arrays, So i felt its more convenient If there is an option to configure display format.

It is not a problem with arrays either.

cJSON_bool CustomNumberPrinter(cJSON *number)
{
    if (!cJSON_IsNumber(number))
    {
        return 0;
    }

    /* print your number to valuestring */
    number->valuestring = /* result here */;

    number->type = ((number->type) & (~cJSON_Number)) | cJSON_Raw;

    return 1;
}

Do the above for every number in the array and you're done.

FSMaxB commented 6 years ago

I thought of an improvement to this approach. Currently the idea is to have a configuration in the form of JSON and then convert that into a binary cJSON_Configuration using cJSON_CreateConfiguration along with hooks for the allocators etc. This would mean that from that point on, the layout of what hooks can be added would be more or less fixed or at least the way to add them may become inconsistent.

This can be solved with the builder pattern. First create a configuration from JSON, then use additional functions to add function pointers etc.

Updated example:

cJSON *configuration_json = cJSON_CreateObject();
cJSON_AddItemToObject(configuration_json, "format", cJSON_CreateBool(0));

cJSON_Configuration *configuration = cJSON_CreateConfiguration(configuration_json);
cJSON_ConfigurationAddAllocator(configuration, &malloc, &free, &realloc);
cJSON_ConfigurationAddSomeFutureThing(configuration, ...);

/* this prints the configuration_json without formatting */
char *printed_configuration = cJSON_PrintWithConfiguration(configuration_json, configuration);
FSMaxB commented 6 years ago

Or even going further: Throwing away all the JSON and only use the builder pattern for configuration. Also renaming the configuration to a context, because it also allows back-channels (like where the parser stopped, or error messages).

ajaybhargav commented 6 years ago

On my local fork I actually did the decimal places implementation. I felt it was simplest thing to do and did the job perfectly. Sometimes, It's better to keep things simple rather than overcomplicating them and make them difficult to use. Usability of library is more important than focusing just on the abstracting things. This is my opinion... :)

FSMaxB commented 6 years ago

I'm quite happy so far with the usability aspect of the new changes. The only thing that makes it complicated is that there are many APIs to do almost the same thing, because of backwards compatibility.

The way it currently stands, you would use the new APIs like this:

cJSON_Context context = NULL;
cJSON *json = NULL;
const char some_json[] = [...];

/* NULL, NULL if no special allocator is used for allocating the context */
context = cJSON_CreateContext(NULL, NULL);

/* use this to use a special allocator for parsing/printing etc */
context = cJSON_SetAllocators(context, allocators);
/* Set userdata for the allocators, e.g. for pool allocators */
context  cJSON_SetUserdata(context, userdata);

/* these are for other kinds of changes */
context = cJSON_SetFormat(context, CJSON_FORMAT_NONE);
context = cJSON_SetPrebufferSize(context, 4096);
context = cJSON_AllowDataAfterJson(context, false);
assert((context != NULL) || "Failed to create context");

json = cJSON_ParseWithContext(some_json, context); /* context can be NULL to use the default context */
assert((json != NULL) || "Failed to parse JSON");

/* if you want to know where the parser stopped */
size_t end_position = cJSON_GetParseEnd(context);

free(json);
free(context);

For v2 all the variants would then be removed, so there is only one API for one task. e.g. one cJSON_Parse, one cJSON_Print etc. All the global state will be gone but instead be contained in a context.

If you always use the same settings throughout your application, you can create the context once and reuse it. If you need a back-channel, e.g. for knowing where the parser stopped, you can still use a global context. If you want to do that concurrently on multiple threads though, you could duplicate the context per thread with cJSON_DuplicateContext.

FSMaxB commented 6 years ago

And for changing the format of numbers, a new API can be added like cJSON_SetPrintPrecision(context, 7) or something.

FSMaxB commented 6 years ago

The design goals are the following:

ajaybhargav commented 6 years ago

I may be wrong but Create context and then setting allocator does not make any sense as context itself is created by an allocator.

context = cJSON_CreateContext(NULL, NULL);
context = cJSON_SetAllocators(context, allocators);
free(context);

Instead of having multiple APIs for configuring a context it would be simpler to just have single API to create and init a context with a structure. you can additionally add simple API to configure context incase one need to update a context. Something like..

// cJSON_context *cJSON_CreateContext(struct cJSON_Configuration *);
context = cJSON_CreateContext(&configuration);
// cJSON_context *cJSON_UpdateContext(cJSON_context *, struct cJSON_Configuration *);
cJSON_UpdateContext(context, &newConfiguration);

I have seen the source and for me things have become more complicated than simple and code starts to bulge now if compared with original source.

FSMaxB commented 6 years ago

I may be wrong but Create context and then setting allocator does not make any sense as context itself is created by an allocator.

I initially took the approach of automatically adding the allocator and userdata that was used to create a context, to the context itself. This neglects that you might want to allocate the configuration separately from the allocator that is later used in normal cJSON operations.

But yes, this is probably not the default of what people want to do or expect. It might be a better idea to just attach the allocator and userdata by default, and let the user change it later, if they want to. This requires a global with the default allocators, otherwise people would have to write their own wrappers around malloc, just to get the default allocator back:

extern const cJSON_Allocators default_allocators;

Instead of having multiple APIs for configuring a context it would be simpler to just have single API to create and init a context with a structure. you can additionally add simple API to configure context incase one need to update a context.

Yes, this would be the easiest, but it has one big problem: Once I create such a configuration struct, I cannot change it anymore without breaking the ABI (and API for that matter although this depends on how you would access that struct). It needs to be an opaque pointer from the perspective of the user, so I can add new configuration options in later versions.

I have seen the source and for me things have become more complicated than simple and code starts to bulge now if compared with original source.

I agree that there is quite some new complexity and doubling of existing API. But this is a necessary evil.

Also @ajaybhargav (or any other person): If you have a better idea on how to do this a better way, I'm glad to implement it. I've put some thought into what I came up with by now and went through some iterations, but I'm definitely not perfect and there might be a better solution.

FSMaxB commented 6 years ago

I also think that this might be a good opportunity for having a release candidate instead of just releasing a new version, so I can get a bit more feedback before I fully embrace this approach.

ajaybhargav commented 6 years ago

Release candidate might be better as this would allow people to post their views and thoughts on adaptability of the new face of library. I would appreciate if others can also review the v2 changes and post their thoughts.

I personally feel if there are too many functions to call to make/parse a simple JSON I would like to use v1 instead of v2 and Since I am always work on small platforms, I feel keeping code size on the lower side is always a 👍 Maybe adding a runtime configuration for few things e.g. if system has inbuilt functions like strdup etc. one can further reduce the code size. I am not sure if others feels the same way.

FSMaxB commented 6 years ago

Maybe adding a runtime configuration for few things e.g. if system has inbuilt functions like strdup etc. one can further reduce the code size.

You mean compile time, not runtime I guess. About strdup: The reason there is custom_strdup is because it can take custom allocators, because strdup only works with malloc. But I think in terms of reducing codsize it would probably make more sense to disable cJSON_Compare, cJSON_Minify etc.

For now v2 is still in the future. But these changes are paving the to getting there. I especially hope to reduce the size of the cJSON struct, because it is quite wasteful at the moment.

But I guess that the "Ultralighweight" from the README is kind of misleading, I should change that. I would say that the target audience is everything starting from "beefy" embedded systems, embedded systems that run Linux and other projects that want a fairly small JSON library without additional dependencies.

ajaybhargav commented 6 years ago

Yes I meant compile time configuration. Just like we have for LwIP (lightweight TCP/IP stack).

The word ultralightweight is what makes this library so popular coz Linux has plenty of options but small platforms like ARM core with few KBs of RAM and ROM there are very limited options available. cJSON is always the first choice because its ultralightweight. You can see most of the community driven embedded systems like ESP8266 and small IoT platforms are using this library and I feel they will stick to v1 because it serves the purpose, works wonderfully and its Ultralightweight 😄

FSMaxB commented 6 years ago

I thought that ultralightweight would require something more lightweight.

About the use on mikrocontrollers: If the main concern is code size, I can probably make some code compile time optional.

FSMaxB commented 6 years ago

I was made aware of a completely different approach of versioning configuration options:

Something like this:

cJSON_Bool cJSON_FunctionReceivingConfiguration(cJSON_Configuration* configuration, size_t size) {
    cJSON_Configuration configuration = default_configuration;
    if (size > sizeof(cJSON_Configuration)) {
        return false; //Something went really wrong here!
    }
    if (configuration != NULL) {
        //now copy the external configuration into the default configuration
        memcpy(&default_configuration, configuration, size);
        // Old configuration versions will be shorter, so new values will keep their default values
    }

    // ...
}

This approach would: