ahausladen / JsonDataObjects

JSON parser for Delphi 2009 and newer
MIT License
413 stars 160 forks source link

Support UTF8String/WideString in JsonObject/JsonArray #31

Open zuobaoquan opened 8 years ago

zuobaoquan commented 8 years ago

I have to serialize and transmit data structures like this:

TFoo = class
  Html: WideString;
  FileContent: UTF8String;
end;

e.g. The producer may be Internet Explorer or IDE which limits the type/encoding.

It seems that I have to cast them to/from string in both sides. Since the content is pretty large, it would be great if the object model could support these two types natively. It should be also more efficient as I use UTF8-encoding for transmission.

ahausladen commented 8 years ago

UTF8String would be a good addition.

WideString on the other side is a copy-on-assign data type. The data would still be copied into the internal data structure. So it makes no difference if it is copied to another WideString or a UnicodeString (that then uses copy-on-write internally).

When I initially developed JasonDataObject, I thought about using UTF8String internally only to reduce the memory usage. But that would have meant that a conversion would be needed every time a property is accessed.

Maybe I should add a logic that if you parse a UTF8 stream, all strings are stored in UTF8 and if you access a property via a UnicodeString getter, it is automatically converted and the UTF16 string replaces the internal stored UTF8 string. So you can have the best of both worlds. Only strings that are accessed are converted to UTF16, making the UTF8 parser a little bit faster and it saves memory (unless you have UTF8 code-points that require more than two UTF8 characters).

zuobaoquan commented 8 years ago

ok. let's forget about widestring. although there are potential solutions, I can live with that.

UTF8String is more useful. Your idea is very brilliant :-)

zuobaoquan commented 8 years ago

btw. depends on your idea, when parsing a UTF8Stream, will be underlying data UTF8String or a managed memory buffer? in former case, will be the underlying utf8string always freed when serialize/deserialize objects containing string members to/from utf8stream?

ahausladen commented 8 years ago

It would be stored as UTF8String (if the platform supports it, otherwise is falls back to UnicodeString) so you have the benefit of a reference counted copy-on-write string.

zuobaoquan commented 8 years ago

Just an idea, an option to specify default string encoding might be helpful. e.g. when read/write json with utf-8 encoding (widely used), object properties maybe just string, in this case, underlying UTF8String instances will be always freed.

TFoo = class
  Name: string;
end;