JuliaIO / JSON.jl

JSON parsing and printing
Other
308 stars 100 forks source link

type hints #96

Open kyle-pena-nlp opened 9 years ago

kyle-pena-nlp commented 9 years ago

You may want to implement some kind of type hinting system to preserve polymorphism when serializing / de-serializing.

Consider the following scenario:

type SketchPad
   Shapes :: Shape[]
end

abstract Shape

type Circle <: Shape
  Radius :: Int
end

type Square <: Shape
  Length :: Int
end

type HorizontalLine <: Shape
   Length :: Int
end

Serializing an instance of sketchpad would work fine, but when deserializing, type Square and Horizontal line are ambiguous.

The solution that others have adopted is to place a "type hint" field as the first field when serializing. For example:

"Shapes": [ { "__type" : "Circle", ... ], { "__type" : "Square", ... } ]

I haven't thought through the design aspects of this, but I can tell you that supporting polymorphism in json deserialization is a pretty critical need in LOB applications.

[formatting – @StefanKarpinski]

jiahao commented 9 years ago

As with most open source projects, we'd happily take a pull request!

kyle-pena-nlp commented 9 years ago

Cool, I'll give it a shot

StefanKarpinski commented 9 years ago

Hmm. I don't really know about this. If this is what you want, I think that JSON is not the right format. I especially don't think it belongs in the JSON package. One of the best things about JSON is how simple and limited it is. YAML gets into this kind of thing, and IMO, it's kind of confusing.

timholy commented 9 years ago

If you're not specifically looking for JSON, check out HDF5.jl.

StefanKarpinski commented 9 years ago

Or maybe https://github.com/dcjones/YAML.jl since YAML does have support for this kind of thing.

kyle-pena-nlp commented 9 years ago

Thanks for the feedback on the idea. Understood on the desire to keep JSON pure/simple. Should I invest time with an implementation? The consensus seems to be to not use JSON for this kind of thing.

StefanKarpinski commented 9 years ago

Maybe as an add-on package? JSONTypes.jl or something like that?

kyle-pena-nlp commented 9 years ago

That sounds like a good idea @StefanKarpinski Thanks for the formatting fix earlier btw, you caught me in a moment of laziness

StefanKarpinski commented 9 years ago

No worries. I do it reflexively at this point.

yeesian commented 9 years ago

@kyle-pena-nlp, I currently do the parsing manually, but it'll be great if you can make it work for https://github.com/JuliaGeo/GeoJSON.jl!

kyle-pena-nlp commented 9 years ago

Thanks yeesian, you are spurring me on to make this happen.

yeesian commented 9 years ago

If you're going to give it a shot, I don't think it's sufficiently different (unlike YAML) to deserve a package of its own, and suggest submitting a PR (as jiahao mentioned) instead

kyle-pena-nlp commented 9 years ago

Understood, however I may write a separate package anyhow, here's the reasons why:


type Foo
  Bar :: Int
  DerivedBar :: Int
  Foo(bar :: Int) = new(bar, 2 * bar)
end
{ 
  $type: Foo
  Bar : 3
  DerivedBar : 7
}

What should the parser do? (a) Should it ignore DerivedBar and try to auto-match "Bar : 3" with the constructor argument "bar :: Int"? (b) Should it attempt to invoke new() directly and bypass the entire issue? (c) Should we allow for the programmer to specify what the parser should do? (d) Should we disallow parsing types like this?

(a) is magical (b) may violate invariants the programmer has enforced in their constructor (c) may be burdensome for the programmer if it is necessary often (d) is a fairly large restriction

JSON.net (a fairly robust JSON serialization implementation for .NET) prefers (a) and allows (c), but the exact behaviour will depend on a mix of configuration and the signature of the type. I don't think we want that kind of unpredictability here. Thoughts?

On another note there is also the question of modules and type names. Should type names be fully qualified or not? Or perhaps this behaviour should be configurable?

yeesian commented 9 years ago

If that juice isn't worth the squeeze, it may be simple enough to submit a PR which augments the serializer with an option to emit the $type field.

I'm leaning towards this approach (maybe augmented with simple options for the parser to decide what to do with missing/null values, and maybe choice of attributes to include/omit) -- and agree with timholy and Stefan about using HDF5 or YAML, if you're looking into more complex schemas for serialization -- unless you have a use-case that isn't being supported by HDF5 or YAML?

Quantisan commented 9 years ago

@kyle-pena-nlp there are ser/de libraries for structured data like Protocol Buffer, Thrift, Avro to preserve types across systems. You can achieve this by implementing a Julia library for one of them and then it would work with any systems.

There are probably JSON-compatible ser/de libraries too but I don't know ones off the top of my head. MessagePack and Transit are examples but they can't do custom objects.

kmsquire commented 9 years ago

https://github.com/tanmaykm/ProtoBuf.jl https://github.com/tanmaykm/Thrift.jl https://github.com/kmsquire/MsgPack.jl

kyle-pena-nlp commented 9 years ago

@yeesian @Quantisan @kmsquire - Thanks to all for the discussion / suggestions.

Unfortunately, I was in one of those situations where I needed to integrate some work done in Julia with an existing .NET system, which accepted as input type-hinted JSON. Given the crunch that I was in it wasn't practical to rewrite the .NET code. If I had the opportunity to build the system from scratch I likely would have gone with a different data format entirely, like one of the ones listed above.

What I ended up doing was writing a very quick and dirty JSON serializer --- nothing of the quality where I could share it with the community --- but enough to get the job at hand done. However, I'll keep my eye on the thread to see if anyone else finds themselves in a situation similar to the one I was in!