AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.78k stars 654 forks source link

Support for serialization and deserialization of JSON #292

Open willemolding opened 5 years ago

willemolding commented 5 years ago

Given the strong type requirements of AS it would be ideal to be able to parse a JSON string directly into a given class. A decorator could be used to generate the serialize/deserialize functionality on the class at compile time.

@serializable
@deserializable
class T {
  x: i32
  y: string
}

let obj: T = JSON.parse<T>(jsonStr)

This would return an error code if jsonStr is invalid or not deserializable to the class.

dcodeIO commented 5 years ago

Looks like a nice API to me, yeah. Maybe the decorators should be on the serializable fields instead, in case just some fields are meant to be serialized. Classes could even be serializable by default (all fields if none are specifically annotated) due to the fact that parse<T> is statically typed (would be a builtin that could then build the encoders/decoders once it becomes compiled).

Might also be possible to start prototyping something like this right away as a custom transform.

willemolding commented 5 years ago

Oh yeah you are right about the fields. There is also the possibility that the field names are different to the key strings in the JSON string you want to decode so maybe something like:

class Address {
  @deserializable('house_number')
  houseNumber: i32
...
}

As you say it would be possible to infer which classes need encoders/decoders to be generated at compile time by which ones are used as a generic in a call to parse<>.

In terms of implementation I feel like it would be best to leverage as much of an existing JSON parsing library as possible. Even one written in another language that can compile to wasm such as rust Serde or C++ RapidJSON. Any thoughts on what an interface to this might look like?

dcodeIO commented 5 years ago

Even one written in another language

I'd prefer one written in TypeScript so it can just be part of the standard library. Porting a fast and compact one seems like an ideal option.

Edit: The implementations you mentioned seem to have their own implementations of dtoa, regexpes etc. for example, that could actually be shared with other parts of the standard library.

willemolding commented 5 years ago

I started working on an event based JSON parser written in AS based on RapidJSON (https://github.com/willemolding/asm-json-parser). This is what convinced me that using a third party might be a better option. Writing a parser that fully complies with JSON standard is non-trivial and since it all compiles to wasm anyway possibly unnecessary

dcodeIO commented 5 years ago

possibly unnecessary

The disadvantage there is that we'd get duplicate code for similar things (parsing and serializing of float literals is on example), just because standalone parsers like these naturally can't reuse what's already provided by the strandard library. This doesn't matter much for a desktop application, but reducing code size makes sense on the web.

I agree though that it should become easier to integrate non-AS code, if one wants to.

MaxGraey commented 5 years ago

@dcodeIO It will be great if we can resolve generics by returning value as well. Rust could do this. Just for example:

fn parse<T: Default>(_arg: &str) -> T {
    println!("{}", unsafe { std::intrinsics::type_name::<T>() });
    T::default()
}
fn main() {
    let _foo: i32 = parse("abc");
    // print "i32"
}

It would be allow us use JSON.parse without specify typename in brackets:

let obj: Foo = JSON.parse(fooStr);
// or
let obj = JSON.parse(fooStr) as Foo;

Instead:

let obj = JSON.parse<Foo>(fooStr);

which also possible but looks less natural for TypeScript

dcodeIO commented 5 years ago

If JSON.parse were a builtin, it could do type inferrence like other builtins as well. If it isn't a builtin, there'd have to be another way as you propose.

MaxGraey commented 5 years ago

Hmm, I think resolving generics in that way will be useful in any case. But if you see this hard for now making JSON as builtin also great

dcodeIO commented 5 years ago

The ideal case could be to do all of this with proper reflection APIs, like getting a list of field descriptors (name, type, if optional, default value etc.) and working with that. Involves GC-managed descriptor objects, of course, which might not be super efficient right now.

MaxGraey commented 5 years ago

Sounds promising and it seems this allow creating custom decorators without post-processing via --transform. But also require a lot of efforts for now

willemolding commented 5 years ago

From our perspective (using AS for writing holochain apps) we would prefer not to require GC and have everything happen at compile time

willemolding commented 5 years ago

In the interest of keeping code size small and reusing existing float parsing functions etc. what do you think about either using or porting this lib (https://github.com/zserge/jsmn) ?

It seems to do a nice job decoupling the actual parsing of values from the delimiting of values from the JSON string.

dcodeIO commented 5 years ago

That one looks easy to port, yeah. I like its simplicity.

MaxGraey commented 5 years ago

Unfortunately jsmn is very-very slow

MaxGraey commented 5 years ago

performance_json_parse

MaxGraey commented 5 years ago

RapidJSON can be pretty compact if skip some unnecessary modes.

willemolding commented 5 years ago

Interesting. I wonder if that is less of an issue for the problem we are trying to solve. I don't imagine anyone will be parsing a multi-GB json file in to an AS class

MaxGraey commented 5 years ago

JSON is pretty important entity for JS/TS world. Of course parsing 1 GB file is pretty rare operation but possible when user working with IndexedDB in browser or some files in node.js. Parsing/stringifing large number of small json objects/string could be more often task.

Second very important thing is conformance/correctness:

conformance_overall_result

As you can see jsmn has second of the worst result

willemolding commented 5 years ago

Ah yes. From what I can tell it only supports ASCII while the JSON standard requires full UTF-8 support which would seriously impact conformance

MaxGraey commented 5 years ago

Not only with UTF-8. Parsing JSON is not trivial even for numbers and also cause to false-positive validation issues. You could get more info about conformance

MaxGraey commented 5 years ago

And btw one more thing about performance. You see performance of C++ versions. After porting to wasm it could 2-4 times slower.

EDIT RapidJSON could parse with 0.1 GB/s in best case (use a 3.5 GHz processor). So in wasm it would approximately 0.02-0.03 Gb/s. So for example parsing of 10 mb json takes ~500ms. That's is RapidJSON. And jsmn takes approximately 25 seconds for that=)

MaxGraey commented 5 years ago

Also if you care about binary code size RapidJSON is only twice bigger than jsmn which in theory could even better if skip some RapidJSON modes and take into account that jsmn is only a parser, and RapidJSON parser and serializer at the same time: code_size

willemolding commented 5 years ago

Cool, so it looks like RapidJSON is the way to go for maximizing compatibility and speed. The remaining decision is whether to think about a port OR to build a wrapper around RapidJSON in cpp, compile to wasm and link somehow.

At this stage i'm not really sure how to go about linking a pre-compiled wasm file so it can be called by another. It seems like there is some kind of functionality for this but perhaps it is not fully supported yet (https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md)

MaxGraey commented 5 years ago

As @dcodeIO mensioned before much better create this on AS language because many things like parsing and stringify numbers already implemented on AS side and could be reused

dcodeIO commented 5 years ago

Another thing to note here is that at some point it might be possible to just reuse JSON.parse and JSON.stringify from the host in some way. If that ever happens, whatever we implement now might become redundant. This of course depends on a lot of factors with one being whether or not just using host-provided Strings turns out to have a significant benefit over storing string data in memory as we do currently.

MaxGraey commented 5 years ago

Unfortunately this not possible for some embedded (standalone) VMs which for example using in blockchain projects. Another issue is overhead for interop communication. If we define serialization inside AS we could provide fast access to pre-generated accessors for fields. Also JSON in JS fully dynamic and parse and serialize whole object always in our side we could skip unserializable fields, so a lot of room for good performance in my opinion

MaxGraey commented 5 years ago

Start implementing JSON.parse and improving parseFloat

willemolding commented 5 years ago

This is exciting news @MaxGraey. As you may have seen from our holochain HDK we have a pretty simple but functional implementation of stringify, although it does require decorating classes to auto-generate a toString method. Is this something you would be interested in merging in?

MaxGraey commented 5 years ago

Yeah, I saw) Pretty exited work by the way! May be I borrow some ideas from your implementation) Actually I found more simpler and smarter than RapidJSON implementation in some parts and I have plan combine all that together

Connoropolous commented 5 years ago

That's great to hear Max. Please keep us posted about your progress and if there's any way we can help

MaxGraey commented 5 years ago

Sure 👍

vgrichina commented 5 years ago

I have created encoder / decoder for JSON: https://github.com/nearprotocol/assemblyscript-json/

It doesn't support floats yet (though not hard to add) + can still be improved performance-wise and probably needs some edge case testing. Generally it implements grammar from http://json.org, so any unhandled cases would be only by mistake.

Let me know if there is anything missing.

P.S. You also will be able to generate bindings automatically, however current implementation only does BSON (see https://github.com/nearprotocol/assemblyscript/tree/master/tests/near-bindgen for example).

vgrichina commented 5 years ago

Binding generation ready as well: https://github.com/nearprotocol/assemblyscript/pull/12

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nickredmark commented 5 years ago

@vgrichina what is missing for this to be completed?

vgrichina commented 5 years ago

@nmaro I've just created implementation that works for our needs at Near Protocol. I think @dcodeIO can answer the question better re: what is needed to build this into assemblyscript.

SimHacker commented 5 years ago

There's a great JSON library for C# called "Json.NET".

It has a serializer, deserializer, automatic type conversion, and a "DOM" representation of JSON.

https://www.newtonsoft.com/json

Polymorphic JavaScript object and array data types are missing from AssemblyScript (on purpose, for good reason), but it would be useful to put just JSON structures back in, without dragging in the entire JavaScript runtime.

I saw some JSON parsers for AssemblyScript, which are callback based.

https://github.com/nearprotocol/assemblyscript-json https://github.com/willemolding/asm-json-parser

The missing link (or missing Linq) is Json.NET's JObject and JArray classes, and polymorphic JValue.

https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_Linq_JObject.htm https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_Linq_JArray.htm https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_Linq_JValue.htm

Json.NET is extremely popular and well supported in the C# world! Lots of great work has gone into optimizing and extending it. And its free open source software!

It's definitely worth taking a look at what Json.NET is doing, because C# and TypeScript are so congruent (both designed by Anders Hejlsberg), so its no coincidence that there's a lot of overlap.

AssemblyScript could start with something simpler, what's possible to implement today, but could eventually catch up to Json.NET in power and flexibility as it matures.

Json.NET uses C# reflection (and user defined converters) to automatically convert back and forth between JSON and C# objects. It also support Linq and fancy C# stuff like that.

https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/

The automatic conversion, reflection, and language integrated query stuff is implemented as higher level layers on top of the JSON "DOM" objects like JObject/JArray/JValue.

AssemblyScript is in its early days and needs to evolve (as wasm evolves) before supporting all the complexities of Json.NET and C#, but I think they're traveling the same road.

Maybe trying to clone the Json.NET API and architecture would be a good approach, since there are a lot of people who use it, and that would make it easier and happier for them to use AssemblyScript.

Or at least be as compatible as possible!

MaxGraey commented 5 years ago

Polymorphic JValue (DOM) is bad idea, has pretty big overhead (should parse whole tree & cost for boxing/unboxing) and not greatly fit to javascript/typescript semantics. Much better provide serialize/de-serialize integration into language. SAX-style (event-based) in this case better but also not friendly for typescript/javascript users. So only way is doing something like in Rust (scheme-based) and use interface or class for describe this scheme. This already suggested in first comment

SimHacker commented 5 years ago

I agree that there's a big overhead. But there are times when it's worth it. So it should be optional.

Json.NET is layered so that Linq DOM objects are optional, and you can use its callback based serializer and deserializer:

https://www.newtonsoft.com/json/help/html/SerializingJSON.htm

A lot of thought and optimizations have gone into that code, so it's a good thing to look at when developing event based stuff in AssemblyScript.

https://www.newtonsoft.com/json/help/html/SerializationGuide.htm

You don't actually need Linq itself (which is a C# language feature, that I don't think TypeScript supports, but who knows about the future?) in order to use the JObject DOM classes, but that makes it easier.

The serialization/deserialization framework makes it easy to map JSON to C# data types and Unity classes using reflection, and also write more optimized or specialized user defined conversion functions. Linq (JSON DOM) objects just plug in as another data type it can convert to and from.

https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonReader.htm

Json.NET is a huge complex project, and C# is a much older feature rich language than TypeScript or AssemblyScript, so I'm not suggesting cloning it all any time soon. But they have a lot of good ideas that could be applied to AssemblyScript, since they are similar in many ways (thanks to Anders Hejlsberg's excellent taste in language design ;) ).

The JObject/JArray/JValue apis are simple enough to implement in TypeScript, I think. And they could be implemented on top of an existing event based parser.

But I don't know how well AssemblyScript supports runtime type information, reflection and annotations at this time. But I hope it gets those features that make sense in the future.

SimHacker commented 5 years ago

Another data point: Unity3D's WebGL back-end produces WebAssembly modules, and it's very common for Unity apps to use Json.NET, which is available for free on the asset store.

I use it all the time, including the JSON DOM classes, serializer, deserializer, and built-in and custom two-way data type conversion, for parsing large JSON configuration files, as well as for real time JSON messaging, and it works like a charm on wasm, for producing smooth animation and responsive user interfaces!

https://github.com/SimHacker/UnityJS/blob/master/Libraries/UnityJS/Libraries/UnityJS/Scripts/Bridge/Bridge.cs https://github.com/SimHacker/UnityJS/blob/master/Libraries/UnityJS/Libraries/UnityJS/Scripts/BridgeObject/BridgeObject.cs https://github.com/SimHacker/UnityJS/blob/master/Libraries/UnityJS/Libraries/UnityJS/Scripts/Bridge/BridgeJsonConverter.cs https://github.com/SimHacker/UnityJS/blob/master/Libraries/UnityJS/Libraries/UnityJS/Scripts/Bridge/BridgeExtensions.cs

There's nothing about wasm that limits it to non-polymorphic code without boxing/unboxing, reflection, and complex data structures. Json.NET is highly optimized and quite performant in wasm, and supports most of the features present in the full-blown Windows version.

https://www.newtonsoft.com/json/help/html/Performance.htm https://www.newtonsoft.com/json/help/html/JsonNetVsDotNetSerializers.htm

Unity3D's C# compiler compiles C# to CLR / Mono IL (Intermedia Language bytecode), and then il2cpp compiles IL bytecode to C++, then emscripten compiles the C++ to WebAssembly.

Unfortunately, Unity3D WebGL/wasm builds can take hellishly long with three layers of compilers. But fortunately AssemblyScript doesn't take such a complex round-about approach as using three (!!!) compilers with two different intermediate languages (IL and C++) before finally producing webasm.

https://docs.unity3d.com/Manual/IL2CPP.html

Here is the standard free Json.NET library for Unity3D:

https://assetstore.unity.com/packages/tools/input-management/json-net-for-unity-11347

JSON .NET For Unity 152 user reviews (5/5 stars)

[Typical helpful review: "I got this plugin to help with saving a lot of things, in particular some voxel data. I had some problems with multidimensional arrays, and messaged the author, and he did an incredible job of helping me out, and was very friendly while he did so. The plugin is also extremely fast and adds very little to the build size."]

JSON .NET brings the power of Json and Bson serialization to Unity with support for 4.7.2 and up and is compatible with both .NET and IL2CPP backends.

Officially Supported Platforms We officially support all Unity platforms including WebGL, except for WebPlayer, Windows 8.0 and Windows Phone 8.0. Windows 8.1 is supported for 8.1 Universal on Unity 5 and above. If you have a special need for Windows 8.0, or Windows Phone 8.0/8.1 contact me and I can work on a special build.

Consoles are also supported! (Xbox360, Xbox One, PS3, PS4 and WiiU).

Note: GameObjects and MonoBehaviors cannot be serialized directly as well as some built in classes (such as Texture2D that doesn't have a public parameterless constructor) but they are simple to serialize using either proxy classes or creating a custom ContractResolver or JsonConverter. JSON .NET is super extensible. [...]

Highlights Precompiled for faster builds Full source code is included Works with IL2CPP as Well as .NET Backend Supports both JSON and BSON (binary) Serialization Retains original JSON .NET namespaces Supports Stripping down to ByteCode Level on both iOS and Android Supports Micro Mscorlib on Android (but not iOS due to platform limitation).

JSON .NET for Unity retains the original namespaces and structure of the Newtonsoft Json.Net library with Unity supported features. This means that it will function as a drop-in replacement for the existing Json.Net dll for users who wish to target iOS and Web Player but need to use first class serialization. The current asset is based on JSON .NET 8.0.3 with additional official fixes and Unity specific functionality added.

vgrichina commented 5 years ago

@SimHacker you might be interested in https://github.com/nearprotocol/assemblyscript-json

It's SAX-style parser, but it also has polymorphic support (like JSON.parse) in the works: https://github.com/nearprotocol/assemblyscript-json/pull/8

Generally should be relatively easy to plug conversion into any representation you need using that PR as reference.

JairusSW commented 3 years ago

@dcodeIO @MaxGraey Going to implement this while pushing off of my previous work on kati. (https://www.npmjs.com/package/kati) I already ported the serialization method to AssemblyScript, and I'm planning to get the whole thing done today. If it works out fine, I'll send you a pull request! 😀

MaxGraey commented 3 years ago

For proper json first of all AssemblyScript should support inferring by returning type which missing yet. Like:

interface Obj {
  a: i32
  b: string
}

const o1 = JSON.parse<Obj>(input); // this possible for now
const o2 = JSON.parse(input) as Obj;  // not possible yet
const o3: Obj = JSON.parse(input);  // not possible yet

Secondary efficient JSON is not easy task. See prev discussion

JairusSW commented 3 years ago

Yeah, efficient JSON is pretty hard. I know from trying, lol. So,

class JSONschema {
    firstName: string
    lastName: string
    age: number
}

class JSONdata {
    firstName='Jairus'
    lastName='Tanaka'
    age=14
}

const stringified = JSON.stringify(JSONdata)

const parsed = JSON.parse<JSONschema>(stringified)
JairusSW commented 3 years ago

Almost got a prototype. Can't support object because there is no Object.keys(). @MaxGraey how would I get a list of class properties?

MaxGraey commented 3 years ago

Yeah. Reflections like Object.keys() doesn't support yet. So you could try to implement Object.keys() first or use some special static compilation mechanics for codegen. In any case you should deep dive into compiler's source code

JairusSW commented 3 years ago

Okay. Lol, this is super hard. Going to take a loong time...

MaxGraey commented 3 years ago

I told you

JairusSW commented 3 years ago

@MaxGraey , finished! (Still need dynamic arrays coming in v0.2.0) https://github.com/aspkg/as-json

Follows the full JSON spec (besides dynamic arrays) Benchmarks look like this:

trace: JSON-AS Deserialize (100,000 ops): 1438ms
trace: NEAR-JSON Deserialize (100,000 ops): 3068ms
trace: JSON-AS Serialize (100,000 ops): 68ms
trace: NEAR-JSON Serialize (100,000 ops): 803ms
ppedziwiatr commented 2 years ago

@MaxGraey , finished! (Still need dynamic arrays coming in v0.2.0) https://github.com/aspkg/as-json

Follows the full JSON spec (besides dynamic arrays) Benchmarks look like this:

trace: JSON-AS Deserialize (100,000 ops): 1438ms
trace: NEAR-JSON Deserialize (100,000 ops): 3068ms
trace: JSON-AS Serialize (100,000 ops): 68ms
trace: NEAR-JSON Serialize (100,000 ops): 803ms

Hey @JairusSW - where can I currently find the as-json library?