Open asterite opened 6 years ago
Another thing, right now the compiler complains in this case:
class Foo
property x : Int32
end
The error is:
instance variable '@x' of Foo was not initialized directly in all of the 'initialize' methods, rendering it nilable
That's cool!
However, once we include JSON::Serializable
that error is gone:
require "json"
class Foo
include JSON::Serializable
property x : Int32
end
# No error!
The reason is that the include defines an initialize
method that makes sure to initialize all instance vars. So there's already an initialize doing that and the ability for the compiler to check that x
wasn't initialized is lost. In fact... I just found a bug:
require "json"
class Foo
include JSON::Serializable
property x : Int32
end
p Foo.new # => #<Foo:0x104ff0ff0 @x=0>
This happens because of a compiler bug (what a surprise) related to these magic initializers that use @type
inside them. There's a rule for making those initialize magically work, bypassing the usual checks. And it seems the checks are lost in other cases. I coded that specifically with JSON::Serializable
in mind, and now I regret it because it's a complex thing to check, and it's magical.
By removing that "feature", the only way we can achieve this functionality is the way I propose above. It can be with a JSON::Serializable
module or without one (I prefer it without one, less typing and works-out-of-the-box).
Even if we somehow fix the above bug, when we do Foo.new
we'll get an error saying no overload matches... there's
new(JSON::PullParser). That including
JSON::PullParserdefines an
initializemethod for you is a bit annoying, because it now participates in overloads too! And because the lookup for initialize methods is a bit different than regular methods (
initializeis not inherited unless the child type doesn't define new constructors) we have to do that
macro included/inherited` hack.
I think this change will simplify things a lot:
from_json
, initialize_from_json
)@type
thing inside macros inside initialize
), which we'll be able to remove after this change (well, the way I propose is unsafe, because it uses allocate
, but the generated method will raise if any of the required instance vars are not present, making it safe... kind of similar to how Array
uses Pointer
internally but it's still safe)Thoughts again? :-)
Ah... and the downside is that if you are defining models for consuming an API, you'll actually have to write a bit more:
require "json"
class SomeModel
include JSON::Serializable
property x : Int32
property y : Int32
end
# Compiles
With the new way:
require "json"
class SomeModel
property x : Int32
property y : Int32
end
# Doesn't compile: neither x nor y were initialized in an `initialize`
The way to fix it is to define a constructor:
require "json"
class SomeModel
property x : Int32
property y : Int32
# For example:
def initialize(*, @x, @y)
end
end
# Now compiles
If the model has a lot of properties, you might have to write a big constructor... Alternatively, define default values for some properties (if the API is guaranteed to return some properties, giving default values for them is acceptable because you know you'll get correct responses). Anyway, defining a constructor isn't that bad, you can just do def initialize(*, put all required properties here)
.
So, yeah, in some cases you'll have to write more, you won't have to write include JSON::Serializable
but you'll have to write a constructor. But I think this is good, as it makes you think how the class should be initialized outside of JSON. Imagine someone decides to use that class for some reason, maybe in a test, and parsing it from JSON all of the time might not be ideal, so having a constructor is good.
Thinking on a broader scale: JSON/YAML are probably the most important serialization formats. But there are many similar use cases. Should this be the recommended approach for other mappings (XML, Message Pack, DB bindings, etc.) as well? I can't think there would necessarily be any downsides to this. Other than a bunch of methods available on every object, which will often not be really useful unless it's actually supposed to support serialization.
I think this is too magic. I'd like a way to do generic serialization/deserialization, but I don't think this is the way.
The way serde does it it by describing a "universal data model" then everything just maps to that data model. And then that data model is mapped to JSON or YAML or whatever. I think we should do the same. And Serde requires explicit #[derive(Serializable)]
to work, why shouldn't we require explicit includes?
Regarding a constructor hack, maybe a better way to go is to allow a different way of declaring constructors that are not checked by the compiler and that is the programers responsibility to comply with some sound contract. That way there is no hack in the compiler, but a convention.
I don't think serialization should be enabled for all types. I'm ok in adding serialization by default in records for example.
Using the out-of-the-box serialization to avoid triggering #6194 is not worth it IMO.
@bcardiff
1. I don't like that - self.allocate
is the unsafe way of declaring a constructor. We just need to automate adding finalizers so all constructors are always obj = Class.allocate; obj.initialize(*args); obj
. Then that becomes API.
2 and 3. Absolutely agree
I'm thinking that allocate should add the GC finalizer, maybe. Then that problem will be solved.
I think Rust needs a Serializable attribute because they don't have reflection. With reflection you don't need an attribute.
Serde looks nice, but I can't understand how it works. It seems you have to define how a language type maps to a serde type, and then how to serialize that type into the output, and maybe also the other way around. And then you have generic serialization attributes.
If we could go that way, I think it would be awesome. Less repetition if a type is to be serialized into multiple formats.
I wonder how's the efficiency, though. If you have an Array, you have to map it to a serde array? And then serialize that array? Maybe Rust can do that efficiently, I don't think we can, so it will double the number of allocations...
However, I still think that being able to serialize any type without including a specific module would be nice. Again, the reason languages don't support this is because they lack reflection and the power to do it.
Alternatively, if it can be implemented without a compiler hack, that's another solution.
@asterite if you think of it in the JSON::Emitter
and JSON::PullParser
level - there's no efficiency problem. The "data model" is just a standardized interface to the pull parsers and emitters. So we'd have a Serialization::PullParser
module with abstract methods which JSON::PullParser
would implement and same for emitters. Then you could just do JSON.build do |emitter|; my_obj.serialize(emitter); end
and ditto for pull parsers. Then shortcut that up in from_json
and to_json
and you don't even really have a big breaking change.
And +1 to allocate adding the finalizer.
@RX14 I still don't understand how you would avoid that indirection and those extra allocations. I guess I'd have to still a proof of concept implementation to understand it :-)
I was wrong about allocate
adding the GC finalizer. new
invokes allocate
, then initialize
, then adds the finalizer. But if initialize
raises then the finalizer must not be added. So I guess we need a method that does all of that at once, as @bcardiff suggests.
i think include
is not that bad, and this in ruby style also, when you extend class with some new behavior by include
module to it. Adding it to Object would be more magical.
@asterite because there's no indirection. It's just a set of methods pull parsers and emitters implement.
Then, how about we keep JSON::Serializable
but the deserialization doesn't happen through a constructor? That way we don't have to use the {{@type}}
hack, which also currently breaks other constructors (as I showed above). And that way we don't need to use the included
and inherited
hacks either.
sounds cool, but how it would look like? any example?
@kostya This commit from this branch has it working. As you can see:
included
, inherited
and dummy
parameter are all goneinitialize
, because now JSON serialization doesn't count as an initializer (and it avoids the bug I mentioned above). I think this is good, but some can find it annoyingnew
is replaced with from_json
. To implement from_json
we use allocate
and initialize_from_json
. It's "unsafe", but so was the old new
method.i think initialize is not big problem because it also can be generated: https://github.com/kostya/auto_initialize?
@kostya Looks good! So simple :-)
Well, everyone (cc @RX14 @straight-shoota @ysbaddaden) let me know if you think this is a good direction (using from_json
instead of new
), and I can send a PR for JSON and YAML
I'm not sure I like it. I'm not sure why we need to start adding all these fake constructors with this method. I also don't like that it means that any other library using this method now has to generate custom constructors too. So at the very least it should be coupled with a generic multi-format serialization interface. And making serialization implicit is a nono, we should have to include something every time. And then there's the cost of having everything change around all the time. People do write custom self.new(PullParser)
methods, now either those manually written methods have to be unsafe and write a self.from_json
and an initialize_from_json
, or they have to write a self.from_json
alias for their existing self.new
.
And what about default initialized variables such as @foo = ""
which were ignored in the parsing? Wouldn't they break with this new method?
And what are we trying to avoid here? Fixing a compiler bug? I don't get why we have to change things and write unsafe custom-named constructors?
@RX14 I already said that JSON::Serializable
would stay, it just wouldn't be a constructor.
This is the main problem:
JSON.mapping
was a macro that defined a constructor along with some properties. The constructor could then be checked by the compiler to see if all instance vars were properly initializedJSON::Serializable
and using compile-time reflection based on the instance vars, the compiler can't check that all instance vars are properly initialized anymore, at least not in the first pass (before instantiating methods) because now the initialize
is a macro method. So workaround that, I added a rule in the compiler to not check that initialize
macro methods are properly initialized. This is a hack I introduced and I regret. I tried to make it work several times and I can't find a way to fix it. That's why the above Foo.new
works when it shouldn't, and it's unsafe.included
and inherited
macros to make it work with inheritance. This is a second hack.included
/inherited
hacks, and also without JSON::Serializable
creating a constructor for you, which means you'll have to define a proper constructor, like it has always been the case.Alternatively, someone can try to fix that compiler hack, but the included
/inherited
hacks will still be there... and my bet is that this won't be fixed anytime soon (I wrote the code and I don't understand it anymore, so I doubt someone will understand it quickly).
Also, serialization in other languages doesn't involve adding new constructors to types. It's a bit weird that a constructor is injected for you. What you want is just a way to create that object from JSON, and that doesn't have to necessarily be a constructor.
Anyway, this is just a minor thing. People are doing fine with some existing compiler bugs and segfaults, so this has no rush.
So the issue is:
JSON::Serializable
to work as it is now.self.allocate
.Okay that's fine, but why does the new constructor need to be called self.from_json
. Why not self.new
and rename the initialize
to initialize_from_json
just the same? Is there anything special about methods called self.new
? I thought there wasn't.
This way it's not a breaking change, just an implementation change.
Yes, if a type defines an initialize
method, then all initialize
and self.new
methods are not inherited. So this doesn't compile:
class Foo
# Let's assume this is the constructor that gets injected in JSON::Serializable
def self.new(x, y)
end
def initialize(x : Int32)
end
end
class Bar < Foo
def initialize(x, y, z)
end
end
Bar.new(1, 2) # This doesn't work
I'm not sure it's the right logic, it's hard because there's new
and initialize
where in many language there's just new
and the constructor is a fixed construct (there's no self.new
in other languages). But, well, it's how it works.
I think I'll close this issue... maybe someone will have to fix the compiler issues/hacks regarding this and we'll be fine.
@asterite No, I really can't imagine how a constructor which is also able to inspect all the instance variables using macros could work. I think this should be removed.
I think that manually-defined self.new
should be inherited normally, only the autogenerated ones from def initialize
should not be inherited.
Besides, I don't know any other language which inherits global ("static") methods anyway. What's the usecase?
And how can serialization make sense with inheritance anyway. I think it only makes sense for Java and C# to have "magic" serialization and have inheritance because they can always stick null
in all of their variables. A subclass magically getting all it's ivars serialized without asking is a bit weird to me.
I think I'm starting to dislike JSON::Serializable
entirely, perhaps the only problem with JSON.mapping
was people treating them like objects and adding ivars which aren't serialized and inheritance and weird things. Maybe we should have just a json_record
macro which is literally JSON.mapping
but it generates the whole class and force people to use that. Perhaps it makes people stop adding behaviour and using inheritance with JSON stuff. If they need all of that, they can always wrap it. But then there's all that extra typing.
Besides, I don't know any other language which inherits global ("static") methods anyway.
What do you mean? Ruby does it, PHP also.
We only do it because ruby does it. PHP is a joke.
@RX14 Java and Python do too... Please, check your facts before posting.
Oh, I thought that java didn't inherit static methods. Never mind then. My bad.
I'm closing this. The state of the art for JSON/YAML looks good so far, even though there are some corner cases. Sorry for the noise.
@asterite maybe wait to have other opinions from e.g @ysbaddaden and @bcardiff, before closing abruptly like you did?
Nah. I opened it, I can close it.
I mean, I'm no longer interested in this happening.
I'm definitely interested in fixing the bug. And I think reverting the compiler patch that made the current JSON::Serializable
work and then doing it manually using def initialize_with_json
is fine. It's doable now.
Then you should open a separate issue. The original issue was to have from_json
/to_json
available in all objects. And you are right that initialize_with_json
is unsafe, so that shouldn't be done.
I like the idea of taking a step back from an immediate implementation. Let's think about this for some time. The custom annotations feature is quite young (and awesome!), let's experiment with that. We can explore alternate ways to implementing serialization - it couldn't hurt to look more deeply at other implementations. For now, the current way should be fine. I'm pretty sure we'll end up with some changes sooner or later, but time will tell.
@RX14
Maybe we should have just a json_record macro which is literally JSON.mapping but it generates the whole class and force people to use that.
This sounds a bit like Java Beans. Might not a bad idea. But maybe it is. 🤷♂️
It's definitely beneficial to question the current (and proposed) approaches and think about new ideas.
Aparte (not exactly this topic) about Rust's Serde
: it's "data model" is an API interface to a fixed list of supported data types (29 in total) such as integers, floats, strings, arrays and hashes (but also structs, enums, tuples, ...).
There are 2 traits (interfaces) : Serialize
and Deserialize
to be implemented. They don't have to allocate more memory than needed, they're just an interface. It's similar to our X::PullParser
and X::Emitter
, but with a standard interface (e.g. emitter.map
instead of json.object
or yaml.mapping
or ...).
Having a standard interface for (de)serialization would be as wonderful as having a standard interface for IO.
Going back to this topic: Rust's Serde can automatically generate the implementation using an explicit #[derive(Serialize, Deserialize)]
, similar to include JSON::Serialization
but it doesn't have to generate a constructor, because Rust's structs have a global initializer, for example Point(x: i32, y: i32)
.
Following the forum discussion, I got this basic working draft in this gist, which may or may not be the right approach.
At least, it shows an idea of a possible API.
struct Point
getter x : Int32
@[Serialization(key: "YYY")]
getter y : String
def initialize(@x, @y)
end
end
data = %({"x": 1, "YYY": "abc"})
point = Point.from(JSON, data)
puts point # => Point(@x=1, @y="abc")
{YAML, JSON}.each do |type|
puts point.serialize type
# => {"x":1,"YYY":"abc"}
# ---
# x: 1
# YYY: abc
end
Or, do we want to go through bytes for serializing, like Go marshaling?
Having multiple serialization mechanisms (2 actually) isn't ideal, even less by adding another one. Having a plan to only keep 1, at most, would be a good to have before 1.0.
It's really inconvenient to write lots of do; include JSON::Serializable; end
statements in the code base. Especially for the data objects.
Example some api, there are like 15 of data objects and for everyone you had to include do; include JSON::Serializable; end
. It contributes zero meaning to the code and is just an annoying boilerplate noise.
It contributes zero meaning to the code
That's not true. It explicitly declares these types as being serializable. That certainly does not apply to any type.
Automatically adding serialization methods for all types, including non-serializable types, is questionable at least. It taints the API with methods that are not good to be used.
I understand the record
macro is typically used for simple data types that can also be expected to be serializable. So I think we could make types created by record
automatically serializable. This was suggested before in this discussion.
Record types themselves should generally be suitable for serialization. But that must also be true for the types of their instance variables, which simply can't be expected in the general case.
Maybe that's still okay, I'm not sure. There will be compile time errors if some types of instance variables are not serializable (although they're might not be very user friendly). Some basic serialization methods are even defined on Object
🤷 (#5695).
macro json_record(*args, **kwargs, &block)
record({{*args}}, {{**kwargs}}) do
include JSON::Serializable
{{block && block.body}}
end
end
Auto serialize at least bad for security, imagine is someone accidentally store in some object link to internal object(with passwords or something else) for debug purposes. It silently would be serialized, without someone ever noticing it.
@kostya This is an interesting point, but I'm not sure I agree. Configuration data like passwords is very much serializable and often serialization is employed to store such data. There may be some advantage if serialization is only explicitly made available as that could make the developer aware of that. But I don't think that's much relevant. It's really a problem of protecting sensitive data inside the application, not of sensitive data being serializable. There's not much security benefit in my opinion. Every object has #to_s
and #inspect
and they could already be leaking information if the object is available in an unsafe scope.
There are legitimate reasons for serializing private data, so depending on serializability to indicate safety is insufficient.
@jhass Yes, that's an easy solution. Do you think it should be part of stdlib or a tool for user code?
I wouldn't be 100% against json_record
and yaml_record
in stdlib, but I also think it's simple enough to include into application code as needed. The latter also gives the potential to tailor it further to the application's domain (support multiple serializations, add some common helper methods etc).
When I code in Haskell it's so tedious to say that such and such type can be mapped to and from json (there are two typeclasses for that in Haskell, ugh).
However, if any object is serializable to json then you can accidentally pass an incorrect object to something that serializes it, and it will compile but it won't work as expected. Of course you should test all code, or as much as possible, but making things as serializable is more type safe than not doing it.
@straight-shoota to_s and inspect known to be unsafe, noone would use it in apis. But serialize should be, you get object from db and serialize it to web.
@kostya I wouldn't be so sure that #to_s
is generally considered unsafe. It's used in string interpolation, for example. And serialized data is certainly not necessarily safe to put anywhere. The db object could contain exactly that kind of data you mentioned in your previous comment (confidential credentials for example).
@straight-shoota What @kostya is trying to tell you (excuse my mind-reading!) is that you don't want to unconditionally serialize all fields from the given record
- which might come from the db for example - having fields like password
, salt
, or personally identifiable information (PII)
Not insisting, just a personal opinion, feel free to ignore
A fear that some non serialisable object will be serialised, reminds me a talk by DHH. When hi says, about the fear of method re-definition "People says that methods shouldn't be allowed to be redefined, what if programmer re-defines Array.map to shoot the firework and someone calls it unknowingly?".
What was the last time you saw a serious bug caused by serialising some object that's not designed to be serialised? I can't remember such case in my Ruby, JavaScript or Java practice.
The only serious concern is security. For things like passwords, but as someone mentioned, usually objects containing passwords are serialisable anyway. But if this concern is true, such object could be explicitly marked as non-serialisable, to avoid being visible in dumps, logs etc.
Even though
JSON::Serializable
is a big improvement overJSON.mapping
, having to includeJSON::Serializable
in every type feels like a burden.In many other languages, for example, C#, Go, and I guess Java, serialization works out of the box through runtime reflection. We can do the same in Crystal, with compile-time reflection.
The way to do this is this: we define
Object.from_json(string_or_io)
, on every Object. What this does is create an instance withallocate
(unsafe), create aJSON::PullParser
and pass it to an instance method namedinitialize_from_json
. This method is very similar to theinitialize(pull_parser)
we have. Because this method will raise if a property is not found in the JSON source for a required property (a property without a default value, and not nilable), it's totally safe.The benefit is not having to require
JSON::Serializable
in every object that needs to be serialized. You could do:and it will work right away. Serialization can still be controlled via attributes.
Take for example https://github.com/crystal-lang/crystal/pull/6308 . It's nice, but having to manually include the
JSON::Serializable
module feels redundant, especially for records and simple types.Another benefit is that no module is included, which means #6194 won't trigger (for this case).
Yet another benefit is that imagine you define some simple types in your library. If people want to serialize them, they can do it right away, no need to reopen them and include
JSON::Serializable
in case the author forgot.The diff to achieve this is very simple, ~except there's a compiler bug that prevents this from working (I can work on that)~ I have it working locally.
I'm pretty sure many will say it might be "dangerous" to allow serialization on every type. I don't find it like that. And given that there's precedent in basically every other language out there, I think it's fine.
Thoughts?