Open FSMaxB opened 7 years ago
With fairly minimal rework, one could eliminate previous too. It's barely ever read in the library, and when it is, a single temp variable could eliminate it.
Yes, now that I think about it the library only ever uses it as single linked list.
You might also want to play around with the order of the variables for better struct packing. Depending on the word/pointer size of the system, this could reduce the size of the object.
There are potentially 3 different sizes in the new struct. The size of a pointer, sizeof(int)
and sizeof(double)
. The order I used in the suggested version of the new struct already optimizes that for x86_64 with still being the best version for 32 bit x86. But I haven't considered all the other possible size combinations on other processors.
This would change when removing the previous
pointer though.
With previous
removed, the following variant has no padding on all architectures I can think of:
typedef struct cJSON
{
union {
char *string;
double number;
}
char *name;
struct cJSON *next;
struct cJSON *child;
int type;
} cJSON;
I would suggest to give a name to the union such as data or u indeed anonymous structures and unions are a C11 feature.
I consider this C11 compatible (if even) pseudo code XD. That shows you that I never put this code through a compiler. But correct syntax doesn't matter for the purpose of this discussion.
Sorry, I think my comment was incomplete and didn't give you enough background. I wanted to point out that I would like to keep cjson compatible with C99 and to avoid any C11 features such as anonymous structures. Indeed some old gcc like gcc 4.3 (which is used to compile cjson for blackfin target on buildroot) will not be able to compile cjson anymore.
Actually, cJSON is keeping compatible with C89 and avoids any C99 and C11 features (that haven't been available in C89). Don't worry, this GitHub issue is not cJSONs source code!
How about changing the type to be an enum instead so that the the code will be clearer?
That sounds like a really good idea, but I thought that it is undefined behavior to use an enum with another than the predefined values. Let me check if that is actually true.
Because type
is supposed to be used as bit flags, so you can check if something is array or object for example (e.g. if (json->type & (cJSON_Array | cJSON_Object))
.
You are very correct. I have completly missed that the type was used as bit flags. I thought that it truly was one of the cJSON_types defines. In that case. A different name perhaps inroder to clairify. "used_types" or something like that.
Renaming it to typeinfo
instead of type
might actually be a good idea, because it not only stores the type, but also cJSON_IsReference
and cJSON_IsConst
.
Another possible solution: Make it a bitfield, so cJSON_IsReference
and cJSON_IsConst
are not part of the type.
Also cJSON_False
, cJSON_True
... can be declared ad const int
instead of using a preprocessor macro.
So my current suggestion based on using a bitfield:
typedef struct cJSON
{
union {
char *string;
double number;
} data; /* the union is now named `data` */
char *name;
struct cJSON *next;
struct cJSON *child;
unsigned int type : 12; /* 12 bits allow for 5 more types to be added later */
int is_reference : 1;
int is_const : 1;
/* reserve the remaining two bits, not sure if this should be done or not */
int unused1 : 1;
int unused2 : 1;
} cJSON;
Notes:
unsigned int
for the bitfield (avoiding implementation defined behavior)string
and number
are only accessible via data.XXX
because anonymous unions and structs are not allowed in C89.Using a bitfield fixes ->type == cJSON_XXXX
if a node is a reference or contains a constant string. So it is not necessary to use & 0xFF
anymore when comparing.
Defining the types as const
instead of #define
doesn't work because it is not supported in switch case.
Well, let's change all of the bit fields from int
to unsigned int
. For this reason!
In order to be fully JSON conformant, cJSON needs to support '\0'
in strings. This requires storing the length for name
and string
. Also it should be possible to use an enum for the type.
I also think that cJSON_False
and cJSON_True
should not be separate types. There should be a boolean type instead and an added value in the data
union.
cJSON_Raw
can be removed and replaced by a flag in the bitfield. This allows specifying the type of raw JSON. But this also means that an unknown
type needs to be added. Although this increases the potential of misuse (because you cannot rely on just the type anymore to determine what union member to access).
New draft of the new cJSON
struct (also let's rename it to cJSON_item
):
enum cJSON_Type
{
CJSON_INVALID = 0,
CJSON_BOOLEAN = 1,
CJSON_NULL = 2,
CJSON_NUMBER = 4,
CJSON_STRING = 8,
CJSON_ARRAY = 16,
CJSON_OBJECT = 32,
CJSON_UNKNOWN = 64
};
typedef struct cJSON_string
{
unsigned char *data;
size_t length;
}
typedef struct cJSON_item
{
union
{
cJSON_string string;
double number;
cJSON_bool boolean;
} data; /* the union is now named `data` */
cJSON_string name;
struct cJSON_item *next;
struct cJSON_item *child;
enum cJSON_type type : 12; /* 12 bits allow adding 3 types later, not sure about the 4th because enum is signed int and it behaves weird in bit fields, maybe use unsigned int here instead of enum */
unsigned int read_only_name : 1;
unsigned int read_only_string : 1;
unsigned int read_only_struct : 1; /* the cJSON struct itself should not be deallocated, useful for cJSON_PrintPreallocated */
unsigned int raw : 1;
} cJSON_item;
Although cJSON_string
has a length property, data
should always be length + 1
bytes where the last one is '\0'
. Just as a precaution if somebody feeds this to a function that expects zero terminated strings.
Because unions are dangerous, macros and functions should exist for accessing everything, hiding the implementation details.
I thought of a better idea than using a union. Since cJSON
structs are not stored in arrays, a dynamicly sized struct can be used. There can be a base struct called cJSON
with type information and a next
pointer for the linked list. Other types would then have a cJSON
as their first member and add additional data after that.
This would have mostly two benefits:
Example:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
typedef int cJSON_bool;
typedef enum cJSON_Type {
CJSON_INVALID = 0,
CJSON_BOOLEAN = 1,
CJSON_NULL = 2,
CJSON_NUMBER = 3,
CJSON_STRING = 4,
CJSON_ARRAY = 5,
CJSON_OBJECT = 6
} cJSON_Type;
typedef enum cJSON_NumberType {
CJSON_NUMBER_DOUBLE = 0,
CJSON_NUMBER_INT = 1,
CJSON_NUMBER_LONG = 2
} cJSON_NumberType;
typedef struct cJSON_string {
const char *string;
size_t length;
} cJSON_string;
typedef struct cJSON {
struct cJSON *next;
cJSON_string key;
cJSON_Type type : 4; /* leaves room for 9 more types */
unsigned int is_reference : 1;
unsigned int key_is_reference : 1;
unsigned int data_is_reference : 1;
unsigned int is_raw : 1;
cJSON_NumberType number_type : 2;
unsigned int reserved : 6;
} cJSON;
typedef cJSON cJSON_Null;
typedef struct cJSON_Boolean {
cJSON base;
cJSON_bool value : 8;
} cJSON_Boolean;
typedef struct cJSON_Array {
cJSON base;
cJSON *child;
} cJSON_Array;
typedef cJSON_Array cJSON_Object;
typedef struct cJSON_String {
cJSON base;
cJSON_string value;
} cJSON_String;
typedef cJSON_String cJSON_Raw;
typedef struct cJSON_Number {
cJSON base;
union {
double double_value;
int integer;
long int long_integer;
};
} cJSON_Number;
int main(void) {
printf("sizeof(cJSON) = %zu\n", sizeof(cJSON));
printf("sizeof(cJSON_Boolean) = %zu\n", sizeof(cJSON_Boolean));
printf("sizeof(cJSON_Array) = %zu\n", sizeof(cJSON_Array));
printf("sizeof(cJSON_Object) = %zu\n", sizeof(cJSON_Object));
printf("sizeof(cJSON_String) = %zu\n", sizeof(cJSON_String));
printf("sizeof(cJSON_Raw) = %zu\n", sizeof(cJSON_Raw));
printf("sizeof(cJSON_Number) = %zu\n", sizeof(cJSON_Number));
printf("sizeof(cJSON_Null) = %zu\n", sizeof(cJSON_Null));
return EXIT_SUCCESS;
}
On x86_64 the sizes are:
sizeof(cJSON) = 32
sizeof(cJSON_Boolean) = 40
sizeof(cJSON_Array) = 40
sizeof(cJSON_Object) = 40
sizeof(cJSON_String) = 48
sizeof(cJSON_Raw) = 48
sizeof(cJSON_Number) = 40
sizeof(cJSON_Null) = 32
The names are not final.
I think this change would be especially beneficial for environments where heap space is limited.
Also note that I added the possibility of having int
or long int
numbers, since there have been many requests to support scenarios where double
is not available.
What happened to this? Its been more than 2 years and I can see it is planned for 2.00 but will it ever happen? Reducing the size of cJSON would be very nice for embedded systems, like the ones I'm developing. I would like to be able to handle quite large datastructures with as low memory footprint as possible. Just making a union of the data, removing the previous pointer and making a bitfiled of the type would go a long way. I know it would break the ABI, but is that really a big problem?
indeed, It seems that there are often cases where embedded systems fail due to insufficient memory, reducing the lib size really makes sense for embeded systems. But as Dave said https://github.com/DaveGamble/cJSON/issues/16#issuecomment-228606471, it increases the complexity of the lib, while increasing the user's debugging costs.
Nevertheless, I will try my best to continue Milestone 2.0.0, it will take some time. If you have a good solution, welcome to submit a PR.
Ok, I see. I have not started working with cJSON yet, but I plan to do it within a few months. As you say these changes may not be welcomed by everyone as they involve some trade-offs. I don't know if the changes I would like to see are suitable for everyone. I will probably start of using cJSON as it is but may be required to make my own fork optimized for embedded systems in general and my case in particular.
char *string; “string” is the keyword/type of C++, recommend use str replace, or others
@oneonce No, it's not. C++ has no keyword "string" and the standard string type is in "std" namespace.
Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)
Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)
@RudyFisher7 What are you talking about, there was a release in December. This is an open source project run by volunteers, don't judge the health of it based on if there are old discussions of a feature that never got resolved. It's a bad metric.
No need to get defensive. If you actually look at the issues, more than half of them ARE bug reports actually, quite a few dating a few years back and still marked as open. (Names starting with _ are not found, inconsistent error reporting, etc.) If they WERE all simply enhancements I would of course agree with you. Also, the latest version is a 1.7.x version, just FYI. I guess I typically find, even community driven, repositories resolving bugs with new releases, but they remain open here, so raised a flag for me.
I did give up maintaining it because of burnout a long time ago, but I've seen @Alanscut do some maintenance work since then.
That's understandable.
Actually, I like having valueint
. Some embedded platforms don't do a good job of accurately representing the their largest int
with a double
. Just recently I had to patch our copy of cJSON to render the ints directly because rendering to double then assigning to int internally was giving inaccurate int values. Possibly the fault of the compiler, but the do ect int parse fixed it.
Currently theres three ways the cJSON struct can be improved:
valueint
valuestring
andvaluedouble
If you don't think this is a good idea, please tell me why.
Let me explain:
Getting rid of
valueint
valueint
andvaluedouble
introduces unnecessary complexity in cJSON.valueint
andvaluedouble
increases the chance of misuse by users of the libraryvalueint
is not bringing any extra value on platforms that have integers smaller than 32 bitdouble
can store integers up to 53 bits (> 32)double
type completely.double
, cJSON will never print an exact value of an integer bigger than the biggest exact integer that can be represented bydouble
valueint
is using between 2-8 bytes that are completely wastednext
: 8 byteprev
8 bytechild
8 bytetype
4 bytevaluestring
8 bytevalueint
4 bytevaluedouble
8 bytestring
8 bytevalueint
has only 4 bytes, it takes up 8 bytes in the structUsing a union of valuestring and valuedouble
As an instance of the cJSON struct can only be either a string type or a number type, not both at the same time. Therefore they can be stored at the same place in memory.
New version of the cJSON struct
prev
->previous
valueint
removedvaluedouble
->number
string
->name
valuestring
->string
string
andnumber
in a union