Closed gavinking closed 9 years ago
Note that Integer
, Float
, String
, Character
, Entry
and Sequential
would all be serializable
.
What about generic classes? Won't their serializability depend on the type argument?
Hrm. That's interesting. Indeed, not every Sequential
is serializable. :/
Sadly, this issue is slipping to 1.1 :-(
How would a serialized object look like? Could ceylon/ceylon-sdk#125 maybe be the deserializer?
Some thoughts based on testing clustered web apps and commissioning full clustered test environments to test apps. Based on Ceylon philosophy of doing the obvious and declaring the rest:
schema
annotation that can override any built-in serialization. This would be in line with current state of serialization like Thrift, Avro etc. This would allow two versions of the same class to have the same serialization even if a field had been added.This approach would be better than Java and would save much money and effort for enterprises, as well as enable cross-VM as well as cross-language serialization.
all objects (except Object and Anything) are serializable. All Ceylon modules and SDK will be.
Can that work in practice? What’s the meaning of a serialized File.Writer
, TestRunner
or Callable
?
Sorry, I meant ceylon.collections
and other data-like modules or packages.
Oh, I see. Still, those can only be serializable iff their elements are serializable.
[collections] can only be serializable iff their elements are serializable.
I wonder if that can be represented in the type system?
interface Collection<Element, Serializability=Anything>
satisfies {Element*}&Serializability
given Element satisfies Object&Serializability
given Serializability of Serializable|Anything {
// ...
}
(That looks like some deranged monstrosity. Is there a better way?)
I wonder if that can be represented in the type system?
I think this is a good use for annotations, not for inheritance.
I think this is a good use for annotations, not for inheritance.
True, but that means that the serializer can’t use the object’s internals. OTOH, the _de_serializer needs to be external anyways, and without constructors the internals aren’t of much use as well. (This is assuming @akberc’s schema
– if the (de)serializer is language-internal and can’t be modified, it can be as internal as it wants, of course.)
This is really the ideal kind of situation for using type classes: see Haskell's binary package, for example.
Finally, here's a very strawman proposal for the central interfaces:
import ceylon.language.meta.model {
Attribute
}
"A reference to an instance of [[Class]], with a certain
[[identifer|id]]."
interface Reference<Class> {
"The unique identifier of the instance."
shared formal
Object id;
"Associate the given [[state]] with the instance,
returning a [[StatefulReference]]."
shared formal
StatefulReference<Class> deserialize(
Deconstructed<Class> state);
}
interface StatefulReference<Class>
satisfies Reference<Class> {
"Get the flattened state of the instance."
shared formal
Deconstructed<Class> serialize();
"Get the instance. During deserialization, could force
reconstruction"
throws (`class AssertionError`,
"if there is a problem reconstructing the object
or any object it references")
shared formal
Class instance;
"Force reconstruction of the instance."
throws (`class AssertionError`,
"if there is a problem reconstructing the object
or any object it references")
shared formal void reconstruct();
}
"The flattened state of an instance of [[Class]]."
interface Deconstructed<Class>
satisfies {[Attribute<Class>,Anything]*} {
"Get the value of the given attribute."
throws (`class AssertionError`,
"if the value is missing")
shared formal
Type|Reference<Type> get<Type>(
Attribute<Class,Type> attribute);
}
"A context representing serialization of many objects to a
single output stream. The client is responsible for
registering the objects to be serialized with the context,
assigning them each a unique identifier. Then, the
serialization library is responsible for iterating the
registered objects in the context and persisting their
[[deconstructed states|Deconstructed]] to the output
stream."
interface SerializationContext
satisfies {StatefulReference<Object>*}{
"Create a reference to the given [[instance]] of
[[Class]], assigning it the given [[identifer|id]]."
throws (`class AssertionError`,
"if there is already an instance with the given
identifier")
shared formal
StatefulReference<Class> reference<Class>(Object id,
Class instance);
}
"A context representing deserialization of many objects from
a given input stream. The serialization library is
responsible for processing the stream and registering the
[[deconstructed states|Deconstructed]] of the objects with
the context. Then, it may obtain a reference to a fully
deconstructed object via [[StatefulReference.instance]],
and return it to the client."
interface DeserializationContext
satisfies {Reference<Object>*} {
"Obtain a reference to the instance of [[Class]] with
the given [[identifer|id]]."
shared formal
Reference<Class> reference<Class>(Object id);
}
Note:
SerializationContext
and DeserializationContext
will be provided by the language module.Questions:
Attribute
interface here be the same one we already have, even though in this case it will often be representing private attributes, or do we need a new one?Sane? I'm not sure, could you give some basic usage examples perhaps? For example I find it not immediately obvious why DeserializationContext.reference()
would return a Reference<>
which has deserialize()
method which returns a StatefulReference<>
which itself is another Reference<>
.
I’m somehow having a very hard time understanding this. One question: Where does any user code come into play? You say that “the” (not “default”?) De
/SerializationContext
implementations are provided by the language module; in addition, you have completely detached the serialization mechanism from the classes it serializes (they don’t need to have a serialize()
method or something like that… which is probably good). So how does SerializationContext
create a StatefulReference
?
In fact, I don’t see where this ends at all. Deconstructed
’s get
returns a Type|Reference<Type>
, and a Reference
is deserialize
d with another Deconstructed
– so it seems it’s Deconstructed
s “all the way down.” If, for example, I want to serialize any data structure to a String
– how do I do that?
The goal of this API is to flatten a graph of objects into a set of tuples of their attributes, or unflatten a set of tuples of attributes into objects.
Will you guys stop obsessing over how to write strings? We've been through this before. Writing strings is the easy part. Anyone can turn a bunch of tuples into a string. The hard part is deconstructing a graph of objects, or constructing one, while bypassing the initializers of the objects and visibility checks of the language.
I'm not even interested in strings per se. For me the most interesting kind of (de)serialization if from/to a database.
One question: Where does any user code come into play?
I also don't care about user code. This provides support for frameworks. For example, JSON libraries, ORM libraries, whatever.
One question: Where does any user code come into play?
I also don't care about user code. This provides support for frameworks. For example, JSON libraries, ORM libraries, whatever.
(Within ceylon-spec, I consider these user code as well.) It seems to me the intended use is
value toSerialize = theThingIWantToSerialize;
SerializationContext context = TheCeylonLanguageImplementationOfSerializationContext();
value deconstructed = context.reference(1, toSerialize).serialize();
// where do I put deconstructed?
// elsewhere
DeserializationContext deContext = TheCeylonLanguageImplementationOfDeserializationContext();
value deserialized = deContext.reference(1).deserialize(deconstructed).instance;
// where did I get deconstructed from?
Where did the JSON library come in?
Can you give me a usage example?
Yes, that's exactly right. This is code that occurs in your JSON library.
Ah, so
reference
s what it gets from the Deconstructed
until the returned Type
is “native enough” that it can be serialized directly?And get
returns Reference<Type>
iff the object was already referenced (so the JSON library would know its ID)?
I’m not sure how useful Object id
is… if it really was an arbitrary object, in order to save something completely, I’d have to serialize the id
as well, wouldn’t I? Type parameter perhaps? (And most people would use Integer
or String
.)
these interfaces aren’t supposed to be used by the “end user”
No, not really.
I’m not sure how useful
Object id
is… if it really was an arbitrary object, in order to save something completely, I’d have to serialize theid
as well, wouldn’t I? Type parameter perhaps? (And most people would useInteger
orString
.)
In the flattened form you work with Reference
s to instances, not the instances themselves, since you might have a partial graph at any point in time.
And
get
returnsReference<Type>
iff the object was already referenced (so the JSON library would know its ID)?
The language module doesn't care what you use for ids, so Object
is fine here.
I’m not sure how useful Object id is
I was thinking the same. What's the use-case for serializing something that's basically a Map of is->object and then supporting random-access deserialization of individual objects from that Map? Supposedly they are interrelated so possibly you can't cherry-pick that easily.
To me serialization and deserialization seem to be one-shot operations. If you need to serialize a graph of objects you pass the "root" of that graph and the rest gets pulled in automatically. If there's no real root but you still want to serialize a bunch of objects you put them in a collection and serialize that. If you need to recognize them somehow you put them in a Map and serialize that.
I'm guessing you have an entirely different idea about all of this @gavinking but from looking at the API I can't guess what it is, I need more information before I can opine if this is a sane basis for our serialization.
Huh? How can you deserialize a graph of objects if you can't access an instance by id while reconstructing the graph? This thing has to support referential identity!
Speaking of difficult problems, this one has now been a pain for ten years in the JDK http://bugs.java.com/view_bug.do?bug_id=4957674 Just wondering if this problem of unstable hashcode could affect this interface, or if it is just a matter of implementation in this case.
Well I don't think my proposal is vulnerable to that problem, since I require the client code to assign an identifier to each instance. I never use its hashcode.
I don't think that problem has anything to do with client-assigned identifiers or not. It's about complex objects that need to do internal (re-)initialization based on incomplete data. Part of that could be prevented by first re-creating as much of the object graph as possible and then have a special initializer on each object do the rest of the work, I guess.
I'm not really sure about the DeserializationContext
, it seems to have too little information to perform a deserialization. First, where do the IDs come from that you use to obtain a Reference
? But then to get an actual object you have to pass it the Deconstructed
related to it, but if that has any references to other objects how will it be able to reconstruct those references?
First, where do the IDs come from that you use to obtain a
Reference
?
From the serialized format.
But then to get an actual object you have to pass it the
Deconstructed
related to it, but if that has any references to other objects how will it be able to reconstruct those references?
You have their ids, and you obtain a reference for the deserialization contest. That's why a Deconstructed
holds values and references.
Please, just give an example how you see this work. Just something simple with steps how you see the round trip from object to DB/File/whatever and back, because I just fail to see the whole picture here.
You get a bunch of ids with related values that you read from some input stream. You turn your ids into references by calling DeserializationContext.reference()
you construct tuples (Deconstructed
s) comprising ids of related objects and primitive values and call deserialize on the references one by one. At the end, you have a bunch of StatefulReferences
, and you can turn any one of them into an object by calling instance
on it, which reconstructs the part of the object graph that is referenced from that instance.
Serialization is the same thing in reverse. You register instances one by one with the SerializationContext
, and then, when you're done registering, you can turn them into tuples by calling serialize()
.
Hello guys,
Gavin invited me to give some feedback about this API proposal.
I have tried to understand it by implementing a small prototype in Java. You can find the project here: https://github.com/fwolff/jeylon (Javanized Ceylon?!). This small prototype is very limited but it could serve as a concrete sample of what could be a working API / implementation. There is a Junit test case that shows a full (de)serialization process, handling both object and string references: https://github.com/fwolff/jeylon/blob/master/src/test/TestAlpha.java.
The prototype takes care of one the biggest issue (feature?) of Ceylon when deserializing objects: Ceylon doesn't allow the creation of "blank" objects, ie. without all concrete properties passed to the constructor. The deserialization must then be a 2-phases process, the first one collecting all references and destructured properties, the second one actually creating the graph of objects returned to the user.
However, I think a 2-phases process isn't required during serialization: references handling can be purely internal in the serializer library, it doesn't need to be exposed to the low level Ceylon serialization API.
Based on my prototype and my current understanding of Ceylon (near zero) and this API, I would suggest to simplify it as follow:
"A reference to an instance of [[Class]], with a certain
[[identifer|id]]."
interface Reference<Class> {
"The unique identifier of the instance."
shared formal
Object id;
"Associate the given [[state]] with the instance,
returning a [[StatefulReference]]."
shared formal
StatefulReference<Class> deserialize(
Deconstructed<Class> state);
}
interface StatefulReference<Class>
satisfies Reference<Class> {
"Get the instance. During deserialization, could force
reconstruction"
throws (`class AssertionError`,
"if there is a problem reconstructing the object
or any object it references")
shared formal
Class instance;
}
"The flattened state of an instance of [[Class]]."
interface Deconstructed<Class>
satisfies {[Attribute<Class>,Anything]*} {
"Get the value of the given attribute."
throws (`class AssertionError`,
"if the value is missing")
shared formal
Type|Reference<Type> get<Type>(
Attribute<Class,Type> attribute);
}
interface SerializationContext {
"Introspect the given [[instance]] and returns its
properties, so the serializer library can iterate on them and
persist the values."
Deconstructed<Class> deconstruct<Class>(Object instance);
}
interface DeserializationContext {
"Obtain a reference to the instance of [[Class]] with
the given [[identifer|id]]."
shared formal
Reference<Class> reference<Class>(Object id);
}
I'm pretty sure I'm missing many things here, but I hope this kind of concrete feedback can be helpful.
Note: this prototype cannot deal with circular references. I didn't try to emulate the late
keyword in Java, so the Parent/Child model used in my test case isn't circular and the deserialization of a circular graph will eventually fail miserably.
Franck.
@fwolff So the big difference is that the SerializationContext
returns the Deconstructed
directly in one step?
So the issue with that is how does it go about deconstructing references that the object has to other objects? It has to have some way to figure out what the ids of the associated objects are. If they have not yet been registered with the serialization context, and assigned an id, you're going to have to have some per-class getId()
strategy that you register with the context. Well, perhaps that's better; i'm not sure.
At serialization time, there is no references: the Deconstructed
returned by the SerializationContext
only contains the actual property values of the bean to be serialized. It is the serialization library responsibility to figure out if and how it is going to persist references instead of the full state of the bean (eg. a JSON library will certainly not persist references, while other libraries could).
In my prototype, the id strategy (which is very common) is a HashMap<String, Integer> for string references and a IdentityHashMap<Object, Integer> for objects (see https://github.com/fwolff/jeylon/blob/master/src/alpha/AlphaSerializer.java). There is no need to delegate to Ceylon the handling of such ids.
So basically, the SerializationContext
is just a Reflection / Introspector utility.
Sorry if I speak Java here, my knowledge of Ceylon is, as I said, near zero... Question: does Ceylon have something like HashMap / IdentityHashMap that could be used in a serialization API?
At serialization time, there is no references: the
Deconstructed
returned by theSerializationContext
only contains the actual property values of the bean to be serialized. It is the serialization library responsibility to figure out if and how it is going to persist references instead of the full state of the bean (eg. a JSON library will certainly not persist references, while other libraries could).
Well OK, the point about JSON is well-taken. But I think this is still more a question about division of responsibilities, that is, what code as the responsibility for "linearizing" the object graph. I was assuming that this would be the job of the SerializationContext
(which is why, incidentally, mine is iterable). But your point is that:
Interesting.
I have a different question: Is it useful that you can deserialize
a StatefulReference
? The following hierarchy would make more sense to me:
interface Reference<Class> of StatelessReference | StatefulReference {
shared formal Object id;
}
interface StatelessReference<Class> satisfies Reference<Class> {
shared formal StatefulReference<Class> deserialize(Deconstructed<Class> state);
}
interface StatefulReference<Class> satisfies Reference<Class> {
shared formal Class instance;
shared formal Deconstructed<Class> serialize();
shared formal void reconstruct();
}
(EDIT: the important point being the separation of StatelessReference
and StatefulReference
, and that StatefulReference
no longer has deserialize()
.)
@lucaswerkmeister: no, it doesn't make sense to me as well. That's why my implementation throws a UnsupportedOperationException
(https://github.com/fwolff/jeylon/blob/master/src/org/jeylon/serial/impl/StatefulReferenceImpl.java).
@lucaswerkmeister well, you already have its state as a tuple, why prevent them from getting at it?
WDYM? I don’t prevent any getting, I prevent you from stuffing even more state into a Reference
that already has state.
Or do you mean that my StatefulReference
lost serialize()
? That’s just because I copied + adapted @fwolff’s code instead of yours (less scrolling).
Oh, ok, sure. Fine.I misunderstood.
Okay, I added it again (+reconstruct()
) to avoid confusion.
I'm still trying to further simplify the API and I'm thinking about something like that for the deserialization (serialization isn't the main problem here):
"The flattened state of an instance of [[Class]]."
interface Deconstructed<Class>
satisfies {[Attribute<Class>,Anything]*} {
"Get the value of the given attribute."
throws (`class AssertionError`,
"if the value is missing")
shared formal
Type|Reference<Type> get<Type>(
Attribute<Class,Type> attribute);
}
interface Reference<Class> {
"The unique identifier of the instance."
shared formal
Object id;
"The class of the instance."
shared formal
Class type;
"The flattened state of the instance."
shared formal
Deconstructed<Class> state;
}
interface DeserializationContext {
"create and return a new reference with an empty state,
after adding it to the context"
throws (`class AssertionError`,
"if there is already a reference with the given [[id]]")
shared Reference<Class> add(Object id, String typeName);
"get the reference from the context with the
given [[id]]"
throws (`class AssertionError`,
"if there is no reference with the given [[id]]")
shared Reference<Class> get(Object id);
"resolve all references and returned the
first reference as a fully qualified object"
throws (`class AssertionError`,
"if any reference can't be resolved")
shared Object /* Anything? */ resolve();
}
The idea is that the deserializer implementation is just filling the context with new references (id, className, state) and then, after reaching the end of the input stream, simply asking the context to create the graph of objects.
From my previous code, that would be something like:
public Object read() throws Exception {
Object o = readNoInstance();
if (o instanceof Reference)
return context.resolve();
return referenceOrObject;
}
...
private Object readObject() throws Exception {
int type = in.readByte();
if (type == REFERENCE_TYPE) {
int ref = in.readInt();
return context.get(ref);
}
if (type == PLAIN_TYPE) {
String className = (String)readNoInstance();
Reference reference = context.add(referenceIndex++, className);
int count = in.readInt();
for (int i = 0; i < count; i++) {
String name = (String)readNoInstance();
Object value = readNoInstance();
reference.deconstructed.add(new AttributeImpl(cls, name), value);
}
return reference;
}
throw new RuntimeException("Huh...");
}
It is then the job of the context to reconstruct and instantiate the whole graph of the objects, with or without circular references, and the implementation of serialization library would be much simpler.
What do you think? Does it make sense?
F.
@fwolff This looks less typesafe than my original version, isn't it?
String typeName
(!)Deconstructed<Class> state;
what if this has not been set yet? What happens? AssertionError
? Null??i.e. what I liked about Reference``Stateful
reference is that they captured the state of the deserialization of an instance into the typesystem.
You could even do:
Reference<Foo> ref = dc.reference<Foo>(id);
if (is StatefulReference<Foo> ref) {
//already had its state deserialized
Deconstructed<Foo> tuple = ref.serialize(); //retrieve the previously registered state
}
else {
//ref is an "empty" Reference<Foo>
}
So there was a whole nice protocol for interaction between the context and the client. I think you've lost that.
Typesafe or not, as long as a framework can serialise types it doesn't know about (Anything
), then it should be fine.
P.S. This kind of thing:
shared Reference<Class> add(Object id, String typeName);
Is not usually right. Ceylon has reified generics, so you can write:
shared Reference<Clazz> add<Clazz>(Object id);
And inside the body of add()
, Class
is a real reified type that you can inspect. You can even do:
Map<ClassDeclaration<Object> refs = .... ;
assert (is Class<Clazz> clazz = `Clazz`);
refs.put([clazz.declaration, id], ref);
Typesafe or not, as long as a framework can serialise types it doesn't know about (
Anything
), then it should be fine.
Ah yes in fact the calling code doesn't know what the type is at compile time, so it should be:
Reference<Clazz> reference<Clazz>(Object id, Class<Clazz> clazz);
Not sure if the variance of that is correct though.
Ohyes, it's correct.
Ah yes in fact the calling code doesn't know what the type is at compile time, so it should be:
Reference<Clazz> reference<Clazz>(Object id, Class<Clazz> clazz);
In fact, something that we don't currently allow in the language, but I can't think why not, is:
Type<T> type = .... ;
Refrence<T> ref = context.reference<type>(id);
i.e. we should be able to pass a Type
object as a type argument. (We would need a slightly more special syntax than what I show above though.)
We need to define:
Serializer
s and serialization.I'm thinking something along the lines of this:
A class is serializable iff:
serializable
,serializable
orshared
, andserializable
type.All superclasses and all subclasses of a
serializable
class must also be serializable, except forObject
andAnything
.But I'm sure I'm missing some details.