alan-knight / minimal_serialization

Transformer that generates simple serialization code for Dart struct-type objects
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

alternative design based on MapView #1

Open tatumizer opened 10 years ago

tatumizer commented 10 years ago

Comments

First, I'd like to comment on a couple of issues mentioned in Alen's email

For example, none of the examples I saw here included enough information to identify the type of the object,

Type information is known from context (e.g URI+request type+Http headers, etc), no need to include it. Maximum that has to be included automatically is "schema version", but it's not standard.

And it might be a good idea to sort the rules somehow in case JS hashtables have a different order than in the VM

Doesn't LinkedHashMap solve this problem? (AFAIK, all evergreen browsers seems to preserve order in objects -they are very unlikely to break it in the future)

Now, some thoughts from myself.

The way it's implemented in serialization package (through Rules) is not very efficient: we still have to deal with intermediate maps. Library provides large API, with many classes, special cases etc, which, IMO, is too heavyweight for the task at hand.

For a simplified serialization in question, I'd like to propose a very different and simple design, which nonetheless makes possible to handle variety of special cases mentioned in mailing list discussion, with performance comparable with popular java serializers.

Design is based on MapViews (explained below) and makes use of two classes: TypeMetadata and FieldMetadata.

The goal of MapView is to create illusion for serializaer that it works with regular maps/lists, while in fact it will work with objects (hidden in MapView implementation)

First, let's define auxiliary classes.

TypeMetadata

TypeMetadata is generated for each type used in serialization, and includes the following:

The following two function are used for special types that need to be encoded/decoded (e.g. DateTime should be encoded/decoded to/from String and other basic types. Uint8List should be decoded from List). For the purposes of discussion, assume our type is DateTime, for which we have no standard support in JSON

Basic types here are those directly supported by JSON - they are not limited to String/int, it can be Lists or Maps too. E.g., for Uint8List, de-serializer will be lead to believe our real type is List, but when placed into object, this List gets converted to Uint8List, due to decoder.

All generated type metadata gets assembled in a single place - see "type catalog" section below.

FieldMetadata

FieldMetadata is generated for each field (attribute) of struct. E.g. for Person with (firstName, lastName), we will generate two FieldMetadata objects. Included in FieldMetadata are:

Together, these 4 parameters per attribute + 5 parameters from TypeMetadata provide enough information for a variety of serialization formats.

MapView and Serializers

MapView is a base class that provides [] and []= and other standard Map methods for serializable classes, based on generated metadata. E.g. for Person class, transformer will generate (simplified; details may vary):

class PersonJsonSerializer extends MapView {
  final static _fieldMetadata=[
     new FieldMetadata("firstName", "String", (obj,v)=>obj.firstName=v, (obj)=>obj.firstName),
     // etc.
  ];
  PersonJsonSerializer(Person p):super(p, _fieldMetadata) {};

  String stringify() { ... calls JSON serializer }
  static Person parse(String s) { ... calls JSON deserializer }

}  

The idea of MapView is obvious: if all our data were represented as Maps and Lists and primitives, we woould already know how to solve the problem: standard JSON package works very well. To address the case of structs, let's reduce the problem to the one we already know how to solve - by pretending our object is a map (but without creating real map and copying the entire content there)

To use generated serializers, user simply calls

var person = PersonJsonSerializer.parse(jsonAsString); // string to object
var str = new PersonJsonSerializer(personObject).stringify(); // object to string

TypeCatalog

TypeCatalog is a class that collects all information about generated instances of TypeMetadata (it's just a map from type name to TypeMetadata instance).

To support custom types (such as DateTime, Uint8List, Point etc), we need to include appropriate TypeMetadata into catalog. Library can come with built-in TypeMetadata for some popular classes. For other things, user has to write TypeMetadata manually and include into catalog (how exactly - TBD). This can cover many corner cases, but probably not all of them. Some of the "missing features" can be added, if they are simple enough and generic. For the rest, no solution is provided.

Restrictions on serializable classes

As we discussed, design targets primarily classes with all public fields (structs), but this restriction might be too harsh. In fact, we can relax it without making our mechanism more complicated, by just saying that when you annotate any class with @Serializable, transformer will look for public fields only, ignoring the rest - so class may have methods and private fields, too, but they are all ignored by serialization mechanism. Public no-argument constructor is a must; inheritance can be supported, too, if transformer is smart enough to figure out public fields of base class. (Not sure how useful this would be)

Another small issue: what to do when JSON string contains some attributes not declared in the class? It would be good if we could call methodNotFound, as if setter was invoked for an attribute, but we can't simulate this call (we can't create Invocation object - everything is private there). We have to either introduce special method like fieldNotFound, or invoke (from serializer) some fake method with predefined name, so methodNotFound will be automatically called.

Implementation

Implementation is rather straightforward (with the possible exception of transformer itself). I have pigeon_map that does most of the same things, but I made a mistake of introducing "special kind of structs", instead of MapViews. Anyway, it's easy to adapt, I can do it if design is approved (or just forget about it otherwise).

My current implementation makes use of json package (which was abandoned by dart in favor of convert and not maintained any more). Whether it can be made a part of "convert" library is not clear (I don't know enough about it).

As an aside, json serializer (in both packages) is very slow - at least by 2 times slower than expected, based on average ratio between serialization/parsing, according to this table. This is certainly fixable.

alan-knight commented 10 years ago
tatumizer commented 10 years ago

Alan, thanks for comment. I had some thoughts about implementing this map-view thing, but time passed, and I chilled out. By approval I meant doing something together, having discussion etc. There's not much fun in just writing code, without discussion it's unlikely to be good, and no one will use it anyway. As for simple case - sure, it would target simple cases only, which is 99.9% of all cases (at least in my experience).

alan-knight commented 10 years ago

Well, clearly, the idea that simple cases are the only important ones is not the approach I've taken with the serialization package. It tries to be able to handle anything that can be defined in Dart, and this is just an effort to make very simple cases easier to specify. So this seems like it would want to be a different thing altogether. And serialization is not one of my high priorities at the moment, so I don't see having a lot of time to spend on it in the next while. The MapView is interesting as an alternative to the serialization rules, but achieving basically the same thing of having a way to get the data out of and into the object. However, I see a couple of issues. One is the hard-coding of it as a map. If you want a serialization format that's space-efficient, repeating the names of the fields over and over again is not something you want to do. In minimized code those aren't even really the names of the fields. The serialization package can produce maps for debugging purposes, but for more serious use you probably just want to produce a list. The only thing you care about is that the sending and receiving end agree on what positions in the list mean.

The other thing I'd mention is that it's not clear to me with MapView how you handle an object with final fields or private fields that can only be set in the constructor. This, as well as handling cycles, is why serialization divides de-serializing an object into two operations: creation, and populating values. But maybe those fall outside your simple cases.

tatumizer commented 10 years ago

Alan, there's no hardcoding as a map. MapView is just a thin wrapper around regular object, containing a single field: reference to object. Plus static metadata. Please take another look at writeup.

Any class that extends MapView passes the object to be wrapped, plus metadata, to super constructor of MapView. Due to this, operators [] and []= get defined automatically. E.g. when we say Person p= ...; ps=new PersonJSONSerializer(p); then object ps extends MapView ( by definition of PersonJSONSerializer - see above). This means that we can invoke ps["firstName"]="John"; ps["lastName"]="Dow"; Which will result in setting fields (regular fields, not map entries!) in underlying object p. (Getters are similar).

There's no memory overhead in creating wrapper. It only pretends to be a map. In fact, it translates operation ps["firstName"]="John" into p.firstName="John".

Because it implements Map, object ps can be directly used in JSON.parse, without intermediate step of creating temporary map. It's as efficient as it gets. I already implemented similar trick in PigeonMap, which was REALLY a map (though an efficient one). But here, with new design, everything is much simpler, because it's just a MAP VIEW on a regular object.

tatumizer commented 10 years ago

I think, my write-up would be easier to understand if generated class were called simply PersonMapView, This would make it more explicit that it's just a thin wrapper

alan-knight commented 10 years ago

What I meant by hard-coding as a map is that it's hard-coded to look like a map. So your output is going to always be of the form [ { firstName : Alan, lastName : Knight }, {firstName : Alex, lastName : Tatumizer}] And if you serialize a million names, that's a lot of repetitions of the string "firstName". There may not be memory overhead, but there's overhead in the output.

tatumizer commented 10 years ago

Sure, but that's how JSON is designed. The reason we ended up with JSON (of all things), is that, as you know, before it came XML, and it was so bad that just anything looking different would be doomed to succeed. But this is all beside the point. JSON is ok when you exchange data with "foreign" counterpart, with which you can't establish better protocol (and everybody understands JSON these days). It both parties are written in dart, we can use binary format. The point I was going to make is that ANY serialization is easy to implement if we have MapView.

Part of pigeon_map project was a different format ("pigeonson") - completely binary format, no stored names, no nothing. It works by 6 times faster than JSON and, in fact, comparable with best java serializers in performance. Initially, it was slow in javascript, but then you improved something in dart2js, and it started running very fast all of a sudden.

Both serializations (and any other type of serialization I can think of) can be derived from the same generated metadata(*) , which is not surprising: as soon as object looks like a map, it's very easy to manipulate generically, no matter what serialization format is.

(*) well, almost. For binary serialization I in fact used something like ListView, without calling it so Maybe that's what you meant in your comment? In ListView, each field can be accessed by index. But indexes were part of same generated metadata anyway. So I probably need to qualify my statement above: having both MapView and ListView over record is enough for any serialization.

tatumizer commented 10 years ago

I re-read your message above, I'm pretty sure now we are on the same page with regard to MapView / ListView - now I'd like to address another point:

how you handle an object with final fields or private fields that can only be set in the constructor

The problem is that the term "serialization" is too broad. As an example of two opposite poles of this notion, consider java serialization as in "implements Serializable", vs JSON serialization, as in json.org.JSONObject

First of them is generic, it can serialize whatever you want, but it's very slow (and rather complicated). I don't know too many use cases for it - RMI was one, but it's dead. (Probably, there are other uses. e.g. servlet can serialize session attributes, but I would be reluctant to put anything but simple objects into session anyway).

In contrast, JSONObject handles only trivial cases, one of which is Bean - more or less the same as "struct".

I understand the intention of serialization package: to allow most general form of serialization. similar to java.io.Serializable, just more flexible. What I'm suggesting is more similar to JSONObject. These two types of "serialization" have different use cases. But the fine line separating these cases is: second one sacrifices generality for simplicity. In many (most?) cases, it's good enough. But it intentionally tries to avoid complexity, so no final fields, no constructor parameters, just pure data structs specifically designed to be easily transferable from one place to another.

alan-knight commented 10 years ago
alan-knight commented 10 years ago

Also note that DateTime has a constructor and non-settable fields, so falls outside the set you propose to handle.

tatumizer commented 10 years ago

DateTime and other non-standard (in terms of JSON) types are handled by "encoder" and "decoder" - please see TypeMetadata section in writeup. There should be a way for user to define custom encoders/decoders (per type) and add them to TypeMetadata.

Yes, the only purpose of this kind of design is to create invisible layer of code that simplifies conversion between structs and JSON (or similar binary format). User is not supposed to make use of MapView/ListView directly. Example:

@Serializable("json") // annotation to trigger transformer class Person { String firstName="John"; String lastName="Doe" } var p=new Person(); var personAsJsonString=PersionJSONSerializer.stringify(p)); // makes use of MapView! But user doesn't care. print(personAsJsonString); // prints {"firstName": "John", "lastName":"Doe"} var reconstructedPerson=PersionJSONSerializer.parse(personAsJsonString); // returns object of type Person

Annotation "@Serializable", in general, contains a list of formats to be supported. E.g. if we want both JSON and BSON, we would write @Serialiable("json","bson") or something.

I agree that it has to be a separate package. But if it's third-party package, no one will use it ever, and, like now, every one will write their own. This can be solved only by including package into the list of stuff officially handled by dart team (better yet, written by somebody from dart team). (This happens with everything else, too: most gihub project get abandoned after the thrill is gone, and how can it be otherwise? People are not paid for the stuff in any shape or form, not even T-shirt :)

One more thing to note: performance of binary serialization for simple structs in my experiments was quite good, which is essential - especially while passing data between isolates. On the other hand, slow serialization, no matter how generic, will become a bottleneck (I mean, after inter-isolate data transfer is optimized on low level of dart runtime - right now it's very slow).

alan-knight commented 10 years ago

I'm not sure what you mean by "officially handled" by the Dart team. If you mean incorporated into the SDK I think that's extremely unlikely. The general trend is to have less in the SDK and more pushed out into packages. And something like serialization which has many different use cases does not seem like something that should be built-in. Even if you built in a binary serialization mechanism, people might need a different one. e.g. https://github.com/dart-lang/dart-protoc-plugin

tatumizer commented 10 years ago

Sure, something similar to dart-protoc-plugin, but hosted in https://github.com/dart-lang https://github.com/dart-lang/dart-protoc-plugin, and not in some https://github.com/random-dude https://github.com/dart-lang/dart-protoc-plugin

alan-knight commented 10 years ago

Well, serialization isn't hosted in dart-lang, it's in google. But basically you want someone from Google to work on it. Which might happen, but not likely in the short term. I have a lot of other priorities right now, and I don't know who else would be likely to work on it. Typically these things get worked on because there's a concrete need.

tatumizer commented 10 years ago

In principle, I can write it and submit for review (need a co-author/reviewer from the team), but I will have to ask some questions on how to name what, because I'm not very good at naming(*). Not sure if this makes sense though.

(*) as an example: I don't even know how to formalize the facade. In the end, maybe user should call it simply as SimpleJson.stringify(person) (and it will detect runtime type and know how to build MapView) or make a top-level function like stringify(person, format:"json"), or generate special class (e.g. PersonJSONSerializer), or... 10 other options.

alan-knight commented 10 years ago

I really don't have time right now to do anything more than the most minimal work on serialization. Just doing this trivial example has already taken more time than I have.

The facade is probably dictated partly by how import structures should work. If you want multiple formats, then the format is presumably separate from the MapViews for particular classes. So it's reasonable to have a format class. But then how does the format class know about the available MapViews. Either it has to be generated and import them automatically (in which case you always need the full set of MapViews available, even if your application only uses a couple) or else the user has to tell it about them. Which seems to push towards the same sort of thing that's in serialization, there's an instance and you configure it for the classes you need it to support. But it's probably not surprising that my thoughts on this tend towards the thing I already wrote.

tatumizer commented 10 years ago

Yeah, explicit registration seems to be the only way to do it cleanly. Other options are worse.

Maybe we can revisit the idea when we have more time.

Thanks for your comments, Alex