Closed CeylonMigrationBot closed 9 years ago
[@wdrai] Every kind of deserializer will probably have to do something like Class.forName(...) at some point to provide it to reference(id, clazz). Having a "String typeName" argument would just avoid having it in each and every deserializer library (and possibly optimize the "forName" by caching it in the core ceylon serialization). Anyway compile-time type safety is artificial at this point as the deserializer only knows what type of object it reads at runtime.
[@fwolff]
@FroMage: I'm trying to understand your JSON Printer code and I'm a bit confused by the redefinition of the Object class. Does it mean the user has to convert all regular objects into JSON objects before giving them to the Printer.printObject
method?
Based on this sample usage, I guess I'm right:
String getJSON(){
value json = Object {
"name" -> "Introduction to Ceylon",
"authors" -> Array {
"Stef Epardaud",
"Emmanuel Bernard"
}
};
return json.string;
}
Question: with the help of the metamodel API, what would be the Ceylon code to iterate over the properties of a regular Object? Or, to put it another way, do we need to create a Deconstructed
instance for each object to be serialized (which can lead to performance / memory issues) while we could simply iterate over its properties one by one?
[@gavinking] @wdrai I don't think that's quite correct. At least, while it might ultimately turn out to be correct, we don't quite have sufficient proof of it yet.
You see, just because you have an unknown T
doesn't mean that you're not gaining typesafety. For example, the API I proposed enforces that you can only pass Attribute<T,X>
to a Deconstructed<T>
, even if the client doesn't know what T
is. I must admit I have not tried to put together an entire system to convince myself that this amounts to meaningful end-to-end typesafety and not just a theatrical use of generics, but my guess is that for at least some clients it would be meaningful.
For example, if the serializable attributes are identified by annotations, then the Attribute<T,X>
refs are obtained directly from a Class<T>
object, and even though we might not know what T
is at compile time, we can still tell whether an Attribute<T,X>
belongs to the Class<T>
.
Finally, it's to me entirely imaginable that someone might want to use this API to serialize/deserialize objects using handwritten code, or machine-generated code (think ceylon.ast
), or even, in future, a macro, where types are known statically! Reflective clients are not the only usecase for this stuff.
So, while it is indeed possible that this design might not quite pan out and that either the generics get in the way, or are of only theatrical value, that's definitely not clear to me yet.
[@gavinking]
@fwolff This code prints all the shared
attributes of the Person
class:
class Person(shared String name, shared String address) {}
print(type(Person("gavin", "")).getAttributes<Person>());
But this is almost completely useless to you right now, because we're actually also interested in private members, which is something I still need to take properly into account. Hence my question above asking:
- @FroMage can the
Attribute
interface here be the same one we already have, even though in this case it will often be representing private attributes, or do we need a new one?
[@FroMage] getDeclaredAttributes
will also return private attributes, but only on the current type.
[@FroMage] @fwolff: the JSON API only deals with JSON types, not arbitrary types. At least, ATM.
[@fwolff] @gavinking & @FroMage got it, thanks.
I'm wondering if it would be possible to define a serializer as a kind of visitor:
interface Serializer {
"Tell the serializer to start writing the given object.
Returns true if the serializer needs to proceed with
the properties of the object or false if it is going to
write only a reference.
eg. JSON will only write the '{' character while other
serializers would write some binary id, followed by
the fully qualified name of the object class and the
property count"
shared formal Boolean startObject(Object o, Integer propertyCount);
"Tell the serializer to write the Integer [[value]] for the property
identified by [[name]] of the current object.
eg. JSON would write '<name>: <value>[,]'"
shared formal void writeIntegerProperty(String name, Integer value);
// Other *primitive* types...
"Tell the serializer to start writing the given object property.
This call would be followed by a call to startObject with the
value of the property"
shared formal void startObjectProperty(String name);
"Tell the serializer to end the writing of the current object.
eg. JSON will write the '}' character while others
would possibly write some other kind of marker or
nothing"
shared formal void endObject();
}
Then, in the user code, we would have something like this:
// JSONSerializer satisfies the Serializer interface.
JSONSerializer serializer = JSONSerializer(out);
SerializationContext context = SerializationContext(serializer);
// The context iterates over the object properties and
// calls the serializer methods accordingly.
context.serialize(theObjectToSerialize);
What do you think? What I like with this visitor-like idea is that the various serializers would only deal with how to write things and not anymore with how to introspect objects, etc.
[@gavinking] @fwolff This was my first idea, but I think it's fundamentally much less powerful. Basically, the language module would need to take on almost all responsibility for implementing serialization, just leaving the most uninteresting details of writing characters to a string to the serialization lib. And I doubt it would be usable for people implementing an ORM library, for example.
[@gavinking] Plus, it would require that we have annotations in the language module for defining which attributes are transient, and how would you distinguish transience between different externalization formats then, etc, etc. Much less flexible, it seems to me.
[@fwolff] @gavinking you're right, I also think it would be less flexible. Let's forget about this one.
Well, to sum up my current understanding / thinking:
We basically need a reflection API and the Deconstructed
interface can serve this purpose.
"The flattened state of an instance of [[Class]]."
interface Deconstructed<Class>
satisfies {[Attribute<Class>,Anything]*} {
"Get the value of the given attribute.
(no references here!)"
throws (`class AssertionError`,
"if the value is missing")
shared formal
Type/*|Reference*/<Type> get<Type>(
Attribute<Class,Type> attribute);
}
interface SerializationContext {
"Introspect the given [[instance]] and returns its
properties, so the serializer library can iterate on
them and persist the values."
Deconstructed<Class> deconstruct<Class>(Object instance);
}
The Deconstructed
returned here should be immutable and shouldn't contain any reference. It is the responsibility of the serialization library to decide how to deal with references: JSON wouldn't do anything about reference but should at least check if the graph isn't circular and throw an error accordingly. Other implementations would write references based on their on a specific identity policy: strict identity with common objects, equality with strings and possibly id equality with entities.
That's the tricky part. Here we need the help of a core Ceylon module that is able to create a graph of objects from an intermediate representation (which can contain references, even circular).
The serializer would feed the context with the content of its input stream and finally ask the context to create the entire native graph that reflects the intermediate representation. I'm not sure the Deconstructed
/ [Stateful]Reference
interfaces would adequately serve this purpose here.
Form the deserializer perspective, it would be great to have this kind of DeserializationContext
:
interface DeserializationContext {
shared formal
Reference createReference(Object id, String className);
shared formal
Reference getReference(Object id);
shared formal
Anything resolve();
}
The Reference
here doesn't have to be generic because there is nothing we can do in the deserializer with the actual type it is going to be resolved at the end. The Reference
interface is basically representing a mutable collection of members (object properties, collection items, tuples, etc.) The implementation of the Reference
interface would of course holds its id and type, but it doesn't have to be exposed externally.
interface Reference {
"add a property or item to this object or collection reference"
shared formal
void addMember(Member member);
}
The Member
interface would be a root marker interface, which should be extended to represent either an object property or a collection item (or a map entry, etc.) Of course, the member's value could be itself a reference.
At the end, the intermediate representation would then be a collection of Reference
s, the first one representing the root object. The resolve
method would iterate on these references, creating a graph of native Ceylon objects.
This is not very different than the original proposal in a way: it is a 2-phases process and it ends by iterating on a collection of references in order to create the final representation of what was serialized. I just don't think that the last transformation should be implemented (and replicated) in each serialization library.
To put it another way: the DeserializerContext
should expose methods that allow a deserializer (JSON or other) to construct a standard and untyped intermediate representation of what it finds in its input stream.
[@emmanuelbernard] I have only looked at Gavin's original proposal (12 days ago). Apologies if these concerns are already addressed.
In the case of things that don't handle references nor circular references (say XML or JSON), things are probably harder than they should. Keeping an artificial reference id that has not real meaning is not easy. Especially reconstructing the reference at deserialization time.
Can StatefulReference.reconstruct()
be called multiple times. Will it actually recreate several instances of the same object? To be clear what does StatefulReference.instance
returns before I call reconstruct()
. And who is supposed to call it and when?
I think you might need to add SerializationContext.reference(Object id)
to get back an already registered StatefulReference
while walking through the second phase (in case of a JSON like approach where references are not references but nested structures. I guess one could navigate the SerializationContext
sequence manually but that looks like a bunch of work at first sight.
During deserialization, it seems that for each "reference" in the stream, you need to call DeserializationContext.reference(id).deserialize(myDeconstructedStateimpl)
and keep the returned StatefulReference
s as there is no direct way to get access to the StatefulReference
.
BTW is that correct that during deserialization, the library would provide its implementation of Deconstructed
.
It's not clear to me if you could have a one pass implementation at deserialization time assuming your structure does not support explicit references. That would be a bit prohibitive for a JSON implementation.
I still think that a renaming of serialize / deserialize into hydrate / dehydrate makes a clearer distinction between what is presented here and what people mean by serialization.
Can a class influence which field is considered for persistence? Would it provide a Deconstructed
implementation and how it would play with references.
The feedback is a bit disorganised but I hope it's still useful.
[@gavinking]
In the case of things that don't handle references nor circular references (say XML or JSON), things are probably harder than they should. Keeping an artificial reference id that has not real meaning is not easy. Especially reconstructing the reference at deserialization time.
Agreed. The API is optimized for cases with identity. That's something that needs more thinking through.
Now, I happen to believe that there are always (natural) keys, even when they are not made explicit in a data format like XML or JSON. Of course, I realize that this makes me a member of the tiny minority who have actually taken the time to understand data modeling at a superficial level, while the entire rest of the industry is busy following the lemming in front of them over the "schemaless" cliff. Pity the folks who will have to come along in 5-10 years time and clean up the mess of lemming carcasses that is the inevitable consequence of this phenomenon.
Ah, doesn't this just take you back to ye olde days of the Hibernate forums, and all the guys with tables "with no primary key"?
Can
StatefulReference.reconstruct()
be called multiple times.
Sure, subsequent invocations are noops.
Will it actually recreate several instances of the same object?
No. It is the responsibility of the context to manage identity.
To be clear what does
StatefulReference.instance
returns before I callreconstruct()
.
instance
is documented to call reconstruct()
by side-effect. The client never sees an incompletely constructed object. That's one of the main goals of the API.
I think you might need to add
SerializationContext.reference(Object id)
to get back an already registeredStatefulReference
while walking through the second phas
Agreed.
BTW is that correct that during deserialization, the library would provide its implementation of
Deconstructed
.
Yes, correct.
It's not clear to me if you could have a one pass implementation at deserialization time assuming your structure does not support explicit references. That would be a bit prohibitive for a JSON implementation.
OK, we need to think about that.
Can a class influence which field is considered for persistence? Would it provide a
Deconstructed
implementation and how it would play with references.
Well, we need to think about what the rules for that are going to be. I have not really got down that far into the details. In principle, yes, that is one of the goals.
[@FroMage] Actually even if we serialise to JSON we'll want to deal with circular references, and there are standardish ways to do that.
[@fwolff] Based on @FroMage code (parse.ceylon), what would be a concrete JSON implementation of the parseObject
method with the new API if it must support the _class: "path.to.MyBean"
convention and return a path.to.MyBean
instance?
I think a kind of POC based on some actual code would be very helpful at this point.
[@FroMage] Well, I'm not sure at all if the standard generic JSON parser of ceylon.json
must support serialisation of Ceylon types. I think the two parsers should be separate, since they serve different purposes and work differently.
[@fwolff] I'm not saying that the standard JSON parser must support direct (de)serialization of Ceylon types. But if it could, what would be the implementation of the parseObject
method?
Basically, I think it would:
_class
field).Deconstructed
for the given class and populate it with the values, coerced to the strongly typed Ceylon properties.A short code snippet would help clarifying what we have to do in a concrete implementation of a serialization library, if it won't lead to code redundancy, etc.
[@gavinking] @fwolff FYI, @tombentley has started work on implementing this API. It would be good if you guys could sync up somehow.
[@fwolff] @gavinking I'm currently in the middle of nowhere (here). I'll be back next week on Wednesday and see how we can sync with @tombentley.
[@gavinking] OK, coo, thanks.
[@tombentley] @fwolff just ping me here when you're back. I'm hampered by terrible network connectivity right now, but maybe it'll be sorted by then. On 9 Aug 2014 19:38, "Gavin King" notifications@github.com wrote:
OK, coo, thanks.
— Reply to this email directly or view it on GitHub <#3810#issuecomment-51694879>.
[@tombentley] As mentioned on IRC, I've been implementing this API and a serialization library based upon it. The API more or less works, though there are a few things I think could be improved, or are at least worth discussing.
Deconstructor
interface for serialization and just used Deconstructed
for deserialization. These interfaces are implemented by the serialization library.SerializableReference
, DeserializableReference
whose deserialize()
method returns an RealizableReference
(or InstantiatiableReference
or something). Right now I have two implementations of StatefulReference
one for serialization and one for deserialization; it doesn't make sense to call serialize() on a StatefulReference
obtained from a DeserializationContext
.In general it's possible for state in a super class to be visible even when the attribute is refined by a subclass, via super
, like this:
class Super() {
shared default String a = "super";
}
class Sub() extends Super() {
shared actual String a="sub";
shared String b => super.a;
}
This means that where Gavin has Attribute
in his API I'm using ValueDeclaration
. That makes the API slightly less typesafe than it was.
Anyway, this isn't really what I'm wanting to talk about right now...
I've implemented support for serializing generic classes. From the PoV of the API that means adding a method to Deconstructed
for representing type arguments in the serialized state.
shared formal Type getTypeArgument(TypeParameter typeParameter);
(that's ceylon.language.meta.declaration::TypeParameter
and ceylon.language.meta.model::Type
, btw). From the PoV of the serialization library, it has to serialize those Type
s (so that on deserialization I can obtain corresponding TypeDescriptor
s and restore the reified type arguments). Right now my serialization library is sort of cheating: I've written a little parser and I serialize the Type.string
representation, which I parse upon deserialization. I could in principle decompose the Type
into ClassDeclaration
, InterfaceDeclaration
, unions and intersections, but it would be nicer if those things were themselves serializable.
aside: There seems to be no way, using the metamodel API, to intersect and union arbitrary Type
s.
The problems with making the different Type
s serializable are:
native
/platform dependent, in particular they're chock-full of platform dependent fields which need initializing.In other words we need a way to give them a well-defined serializable form (that works cross-platform) which isn't based on obtaining their underlying state directly. We could use annotations on some attribute(s) to declare what this state is (at serialization time). The problem comes in knowing how to transform that state into a properly constructed instances at deserialization time. In particular, in the presence of cycles between these things we would need a way to restore the state of partially constructed instances. That needs to be under user control, and yet without exposing the user to uninitialized instances, which is a paradox. In the absence of cycles there's no fundemental problem, but we'd need a way to construct and initialize the instance from the serialized form in one go.
While we would probably expect serialization libraries to cope natively with things like Integer
and String
(decomposing them to Byte
s if necessary), it starts getting patchy when we get to things like ArraySequence
. One of the uses cases for the API is for serialization to relational databases and collections present a bit of a problem there. Consider things like ArraySequence<Integer|String>
or ArraySequence<Person|Organization>
: We'd need one table per ArraySequence
type. Possible, I suppose, but it gets really messy when we come to Tuple
.
We could say that Tuple is not serializable, but that seems quite a restriction and other serialization libraries wouldn't have a problem with it. So the serializability of a class depends not just on the nature of the class itself, but also on the capabilities of the serialization format (in the form of the serialization library). Really this is just a point about the compatibility of different type systems.
Foo
is serialized to disk.Foo
is changed, e.g. an assert
is added, the code is recompiledFoo
is deserialized. Bang! What the programmer thought was an invariant is violated because the deserialized instance avoided the new assertion.[@gavinking]
Right now my serialization library is sort of cheating: I've written a little parser and I serialize the
Type.string
representation, which I parse upon deserialization.
To me this is not just OK, it's actually preferable, unless there's some reason to believe that performance would be much worse, which I doubt it would be.
Advantages to the string representation include:
Type
s (though perhaps we should find a way to compress the package names)So I don't think we should be trying to serialize the model objects.
[@gavinking]
One of the uses cases for the API is for serialization to relational databases and collections present a bit of a problem there.
We've discussed this before, Tom, and I think what we concluded is that there are two very different usecases here:
In the context of 2, something a Tuple
is a special case that is better modeled as an entity, not an association, even though in the language it hangs off of the collection type hierarchy.
[@sgalles] With this work, how far are we from being able to transport objects between JS and JVM backends ? Are there other missing parts ? Do you think this could make it for 1.1 ? @tombentley can we already test this work ? (I didn't see any commit related to this in the repos)
[@gavinking] Slipping to a 1.1.5 release in October.
[@gavinking]
@tombentley I'm trying to figure out how to use the Deconstructor
API, but I'm a little stuck. You can take a look here:
https://gist.github.com/gavinking/ca2fba39c73d9dc376ee
Basically, when I get to a contained object, I have a choice between:
In the first case, I could do it, but I would have to keep track of ids of things in my own Map
. My original API used to let me obtain the id of a previously registered object, IIRC, but that doesn't seem to be possible now.
In the second case, I need to recurse the Deconstructor
on the referenced object, but I can't see any obvious way to do that.
[@tombentley] If SerializationContext
had a getId()
method then I never saw it, but I agree it should be possible to query the serialization context to get the id of an instance that's already been registered. So I assume we're talking about this:
"Gets the id that the given instance has been registered with,
or null if the given instance has not been registered."
shared Object? getId<Instance>(Instance instance);
As for your embedding objects, due to the design of my proof of concept serialization library that never occurred to me as a requirement (or I reasoned that it was the responsibility of the Deconstructor
, since the API just seed everything as identified Reference
s).
[@gavinking]
If
SerializationContext
had agetId()
method then I never saw it
No it didn't have a getId()
method, but I think it had a way to get the Deconstructed
for an object without assigning an id.
[@tombentley] @gavinking checkout the serialization branch of ceylon.language. There's also https://github.com/tombentley/jsonsl/ which you may, or may not be interested in. Note that @chochos hasn't yet had a chance to update the JS language module, so it's JVM only right now.
[@quintesse] I like how easy the use of the library is! Nice work.
[@EricSL] If jsonsl is any indication of what this is intended to support, this proposal seems to be going in the wrong direction.
There are a few key things serialization libraries need to get right:
(Maybe you're shooting for something more along the lines of Python's pickling, but if so it will be of much more limited use. If not, these concerns need to override completeness concerns.)
What's not so important is serializing graphs. There isn't one conventional way to do this, so you'll have trouble with compatibility with the various ways people encode them now. Instead expect that developers will translate their graphs to/from a serializable acyclic intermediate form that just represents the data, and the serialization library will make it easy to serialize that.
What's not so important is supporting inheritance. Due to security concerns, if the schema says you are deserializing type T, it is not okay to deserialize a subclass of T. It can be okay if the schema explicitly lists the supported subtypes, but if so you need a cross-language compatible way of specifying what the type is. For example, annotations might say that field foo will contain type T if the JSON contains a field "foo_t", or a type U if the JSON contains a field "foo_u". Something equivalent to the oneof feature in protobuf could be an alternative to inheritance: https://developers.google.com/protocol-buffers/docs/proto#oneof Inheritance seems natural in XML but like I said you want to be explicit about what classes you are expecting to deserialize.
It may be helpful to distinguish between root-serializable and field-serializable. I don't think the standard collections should support serialization directly. However, it should obviously be supported to have serializable classes with fields of type Sequence
It may be helpful to start from another language's API just because then there's a decent chance you'll interoperate with that language. C# has an annotation based serialization API, and it was designed around XML, but it works quite well for JSON: https://msdn.microsoft.com/en-us/library/bb410770%28v=vs.110%29.aspx
[@gavinking] @EricSL
If jsonsl is any indication of what this is intended to support, this proposal seems to be going in the wrong direction. .... There are a few key things serialization libraries need to get right:
I don't understand this comment at all. The current API externalizes the listed concerns (among others) to the serialization library itself, and quite deliberately avoids addressing this kind of concern in the language module!
Maybe you're shooting for something more along the lines of Python's pickling, but if so it will be of much more limited use.
Well that is what jasonsl is, but it is not the only thing that the language module serialization API is capable of supporting.
I don't think you've understood the architecture of this.
It may be helpful to start from another language's API
Yew. All other languages handle serialization really badly, in my experience.
C# has an annotation based serialization API
Sure, and the current API can certainly support serialization libraries which are annotation driven. Indeed, that is a central goal. But we certainly don't want to bloat out the language module with annotations for controlling serialization!
[@gavinking] Note that, indications at present are that the current API is too general-purposed, and perhaps won't be capable of supporting reasonable performance. So we might need to scale back our vision here, and provide something much less general-purpose.
[@sirinath] One thing I can request is make it flexible as to make this at compile time resolution of runtime resolution. You can look at the AST of the object to be serialised and map into the serialisation format at compile time and in runtime you will need some form of fast reflection.
Ref: https://github.com/scala/pickling, https://github.com/heathermiller/spores. This might be a good starting point to see how to do this.
[@gavinking] I'm closing this because, in principle, it's done.
We can open new issues for any additional tasks, which probably don't anyway affect ceylon-spec.
[@gavinking] We need to define:
Serializer
s and serialization.I'm thinking something along the lines of this:
A class is serializable iff:
serializable
,serializable
orshared
, andserializable
type.All superclasses and all subclasses of a
serializable
class must also be serializable, except forObject
andAnything
.But I'm sure I'm missing some details.
[Migrated from ceylon/ceylon-spec#704] [Closed at 2015-09-30 13:26:30]