Open cfilipov opened 4 years ago
I've transferred this issue to this repo, since this one contains the dhall-to-json
tool.
You can configure the encoding of unions with the Nesting
type documented in Dhall.JSON
.
Please ask further, if the documentation doesn't answer your questions!
I had to try this for myself and ended up with this:
let Shape =
< rectangle : { length : Double, width : Double }
| circle : { radius : Double }
>
let Geometry : Type = { title : Text, shapes : List Shape }
let geometry
: Geometry
= { title = "My Shapes"
, shapes =
[ Shape.rectangle { length = 1.0, width = 5.0 }
, Shape.circle { radius = 6.0 }
]
}
let geometryToJSON =
let Nesting = https://prelude.dhall-lang.org/v10.0.0/JSON/Nesting
let shapeToJSON =
λ(s : Shape)
→ { nesting = Nesting.Inline, field = "type", contents = s }
in λ(g : Geometry)
→ g
⫽ { shapes =
https://prelude.dhall-lang.org/v10.0.0/List/map
Shape
{ nesting : Nesting, field : Text, contents : Shape }
shapeToJSON
g.shapes
}
in geometryToJSON geometry
Thank you!
Wow, that's a lot of work! I'm imagining what this would look like with a data model that makes heavy use of this pattern. Wish there was a more ergonomic way to do this.
I'm thinking something like typescript's string literal type for discriminated unions would make this more explicit and less magical:
let Shape =
< rectangle : { length : Double, width : Double, type: "rectangle" }
| circle : { radius : Double, type: "circle" }
>
@cfilipov: What TypeScript calls string literal types Dhall calls enums (or unions with empty alternatives)
For example, this TypeScript type:
type Easing = "ease-in" | "ease-out" | "ease-in-out";
... corresponds to this Dhall type:
let Easing = < ease-in | ease-out | ease-in-out >
... but in this case there is no need to do that because you already have a union with alternatives that have the right name.
I think the right solution here is perhaps to provide support for controlling nesting behavior globally (i.e. via a command line switch)
I think the right solution here is perhaps to provide support for controlling nesting behavior globally (i.e. via a command line switch)
That seems like a good idea. We could have alternative options --nesting-inline
and --nesting-nested FIELDNAME
to control the nesting and an option --tag-field FIELDNAME
to control the fieldname for the tag. E.g.
$ dhall-to-json --nesting-nested=value --tag-field=name <<< "< Left : Natural | Right : Natural>.Left 2"
{
"name": "Left",
"value": {
"foo": 2
}
}
Personally I'm not a fan of this being a command line flag. While I feel the current syntax is a bit heavy and "magical" I prefer over a flag.
The reason is, a command line flag is is out of band and it seems wrong to have two different results depending on how you run the program.
@cfilipov I think we can implement this in a way where the current way of specifying the nesting in the code continues to work.
I think the flags could be useful in any case, if only so you can quickly see what the various nesting options look like.
@Gabriel439 You mention that dhall union is essentially the same as what typescript calls string literal types and I see that from your example, but I think there is more to it than that.
_(Aside: In fact, in typescript you can also have numerical literals. Essentially, it seems like you can specify a specific value of a field in a record's type (or it's just a type that only has one value), not sure what you call that formally.)_
The important difference is that typescript allows me to to use type discriminator in a more ad-hoc way, (which feels more ergonomic for that that's worth).
For example, here's a definition in typescript:
type Circle = {
type: "circle";
radius: number;
}
To do this in dhall I would have to do it this way from what I can tell:
let CircleType = < Circle >
let Circle = {
radius : Double,
type: CircleType
}
So in the end, my preferred way to get the result I wanted from my original question is this:
let RectangleType = < Rectangle >
let CircleType = < Circle >
let Shape =
< Rectangle :
{ length : Double, width : Double, type: RectangleType }
| Circle :
{ radius : Double, type: CircleType }
>
let Geometry = { title : Text, shapes : List Shape }
let geometry
: Geometry
= { title =
"My Shapes"
, shapes =
[ Shape.Rectangle { length = 1.0, width = 5.0, type = RectangleType.Rectangle }
, Shape.Circle { radius = 6.0, type = CircleType.Circle }
]
}
in geometry
This might not be the idiomatic way to do this in dhall, but this feel right to me. This is more explicit than the command line flag, you can tell by reading the code what the output would look like. It's also a lot less code than the solution @sjakobi proposed. To me this is more understandable and doesn't invoke the magic incantation of enclosing some special type. Is there any reason I shouldn't do it this way?
@cfilipov That seems totally reasonable if you don't mind duplicating the differentiation between rectangles and circles.
Personally, I would vastly prefer the command-line switch, as the other options unnecessarily clutter one's actual configuration files. If you're purely using dhall to generate JSON and not "natively," then I suppose it's okay to to have to jump through hoops to produce the JSON one wants. But for the case of using dhall itself, with JSON only as a fallback for endpoints/clients that lack a dhall library, it's just messy to have to include two useless fields (and have the whole thing wrapped in an unnecessary record) just on the off-chance that one might need to access the files via a JSON intermediate at some point. Command-line switches represent an ideal fall-back for just such cases.
The reason is, a command line flag is is out of band and it seems wrong to have two different results depending on how you run the program.
As far as the above concern mentioned by @cfilipov of having two different results depending on how the program is run, dhall-to-json
already has multiple flags that result in different output, e.g. --noMaps
, --omitEmpty
, etc., all of which alter the JSON output based on a command-line flag. Unless I'm missing something specific in this case, which is entirely possible?
It’s usually a bad idea to make the semantics of source change if it’s invoked differently.
@Profpatsch: It's not that bad, for two reasons:
It's actually not a change to the Dhall semantics. In other words, the Dhall interpretation passes (import-resolution/type-checking/evaluation) are unaffected. This only affects the final conversion from Dhall to JSON
We're slowly gravitating to making certain options standard instead of configurable as we figure out best practices. For example, --omitNull
recently switched to becoming the default and may become standard later in the future
It's actually not a change to the Dhall semantics. In other words, the Dhall interpretation passes (import-resolution/type-checking/evaluation) are unaffected. This only affects the final conversion from Dhall to JSON
I think this is important. Dhall fundamentally supports things that JSON does not, so it is necessarily a translation from dhall to JSON, not a transcription. Like any translation, there are going to be choices made in how to represent "foreign" constructs in the destination language. It seems like these are best dealt with by instructing (via flags) the translator (dhall-to-json
) how to go about it, rather than rewriting the source text itself, since that information is simply not relevant to the source.
Long story short, the fact that JSON lacks union types is a shortcoming of JSON, not a shortcoming of Dhall; the choice of how to overcome that shortcoming should be made only when one finds it necessary to switch to the deficient representation rather than baggage toted around in the language that supports it (which then clutters up dealing with the data in dhall itself).
Do we agree then that something like the --nesting-{inline,nested}
and --tag-field
options I proposed in https://github.com/dhall-lang/dhall-haskell/issues/1383#issuecomment-538670891 is the way to go?
If we want to move away from defining union constructor translation via the Nesting
type, we should maybe find better names for these options…
One shortcoming of a command line switch is that it either enables or disables the tag field/nesting for all occurrences. I have definitely run across JSON where both styles are used. But of course the aforementioned alternative approaches can be used.
@sjakobi: I think we should support the global command-line options and also preserve the existing functionality, too
The general rule of thumb I use for backwards compatibility is that it's cheap to support if it's not part of the standard (i.e. it's specific to the Haskell implementation) since other/new implementations don't have to support it
Would the option names --union-nesting-{inline,nested}
and --union-tag-field
instead of just --nesting-inline
and --tag-field
be better? They'd be slightly clearer IMHO.
@sjakobi: Yeah, that seems reasonable to me
I'm working on implementing this as another Expr
-to-Expr
pass, along the lines of convertToHomogenousMaps
.
Now I'm wondering what to do with unions that contain non-record alternatives when the --union-tag-inline
option is enabled, for example
let Example = < A | B : { b : Text } | C : Natural >
dhall-to-json
currently translates an inline-tagged Example.A
to { "<field>": "A" }
and an (Example.B { b = "foo" })
to { "<field>": "B", "b": "foo" }
.
In the case of (Example.C 42)
it throws an error, because it has no field name that it could use for the 42
.
For the new pass, I see these options:
Check the union types and error out if we find any non-record alternative.
Check the union types, and refuse to tag any Example
.
[ Example.A, Example.B { b = "foo" }, Example.C 42 ]
=>
[ "A", { "b": "foo" }, 42 ]
Tag Example.A
and Example.B
, but keep Example.C
as a union value:
[ Example.A, Example.B { b = "foo" }, Example.C 42 ]
=>
[ { "<field>": "A" }, { "<field>": "B", "b": "foo" }, 42 ]
A downside to this approach that the result of the pass is ill-typed. This shouldn't cause any problems right now, but might make it harder to implement subsequent changes down the line. (I believe convertToHomogenousMaps
can produce ill-typed terms too.)
Just tag everything, and let any Example.C
get caught later on when converting to JSON. This is very simple to implement, but the resulting errors might be a bit confusing.
Tag any Example.A
and Example.B
, but error out in the case of Example.C
. The error messages should be slightly better than with (4).
To sidestep the issue, we could alternatively make a new rule that creates a field name for Example.C
. We could default to the constructor name ("C"
) and add an option to override this default…
Thoughts?
@sjakobi: I would go with (1)
My reasoning is that:
@Gabriel439 That makes sense to me.
I wonder whether I should implement the union type check in dhallToJSON
, so the check also applies when the new nesting options are not enabled. That would change the behaviour for code that uses Nesting.Inline
directly.
@sjakobi: Is it possible to apply the check at the point where the conversion takes place? (i.e. for the conversion function to return an Either
)?
Do you mean the Dhall-to-JSON conversion or the union-to-"tagged"-record conversion?
dhallToJSON
already returns an Either
. The new pass will return an Either
depending on whether it handles the union type check itself – alternatively it can defer the check to dhallToJSON
.
@sjakobi: I mean the union-to-tagged-record conversion
Ok. Then we will keep the current behaviour where a manually tagged Example.A
or Example.B { b = "foo" }
are converted to JSON for now. It seems a bit inconsistent but that's a separate issue.
Isn't option 1 just going to force a return to pre-v7 unions, where everything just had to be tagged with empty records? Or am I not understanding something?
Sorry, I should probably have been a bit more precise in my description of option (1): We'll error out if there are any non-empty non-record alternatives.
So < Foo >.Foo
is fine (and encoded as {"<tagField>": "Foo"}
, and < Foo : { bar : Bool } >
is also fine. But < Foo : Bool >.Foo True
is a problem because there's no "natural" field name for the Bool
. It's unclear how to encode it: {"<tagField>": Foo", ???: true}
If users run into the error, I think they will just wrap a singleton record around the alternative.
Gotcha, but it wouldn't error on a union like
let relationshipStatus = < Single
| Married : Person
| Divorced
| Widowed
| Engaged : Person >
(where Person
is some record) you're saying. That seems fine to me; I was being dumb and misinterpreting, haha!
Yep, you understood that correctly! I was also surprised by this little complexity.
I just wanted to chip in my two cents.
The issue of encoding non-empty non-record alternatives could be side-stepped by changing how alternatives are encoded. Rather than having a tagging property on each alternative, you could encode each alternative as a record with a single property. So for instance, I imagine that values of type Example = < A | B : { b : Text } | C : Natural >
would be encoded like:
Example.A
to { "A": null }
Example.B { b = "foo" }
to { "B": { "b": "foo" } }
Example.C 42
to { "C": 42 }
Incidentally this is similar to how Aeson
generically encodes objects with the ObjectWithSingleField
option.
One effect of this, obviously, is that the structure of the JSON you generate will be deeper. This may or may not be desirable.
My other idea was regarding the discussion of the command line flag affecting translation globally. It might be desirable to work around this by allowing pragmas in Dhall. I don't know if there's precedence for this in Dhall. I envisage that these pragmas could be annotated inline in the Dhall source file, or placed in separate files. Example:
{-# SumEncoding Example ObjectWithSingleField #-}
let Example = < A | B : { b : Text } | C : Natural >
It would also be possible to specify this per-type on the command line with e.g. --sum-encoding Example=ObjectWithSingleField
. I'm not sure how nice of a user-experience that would be, though.
@fredefox: I like the idea of providing a command-line way to customize the behavior on a per-type-name basis. That would help us avoid having to make language-level changes to affect the encoding.
Defining this on a per-type basis would be perfect. Also, support for ObjectWithSingleField-style encoding would be nice. I just stumbled over a concrete example of a JSON format where this is the chosen encoding.
After working with dhall (and converting to json) for a bit, I noticed that a command line flag would be hell a useful.
There’s two major ways of encoding, which were both mentioned in this thread:
1) < A : x | B : y >.A somex → { "type/kind/whatever": "A", "payload/value/whatever": somex }
2) < A : x | B : y >.A somex → { "A": somex }
Both are valid, both are used in practice, so we should probably support both with command line options.
I would like to propose a generalization (which could be provided in addition to the specializations above:
The user supplies a dhall function of type
\(unionKey: Text) -> \(unionValue: Prelude.JSON.Object) -> Prelude.JSON.Object
so that they can define the structure transformation to how they need it.
I think it would be great to have a simple flag for dhall-to-json
type discriminators. I'm not privy to all of the theoretical or implementation discussion details - but I agree with some others here that it would be nice to not have to have explicit "serialization" logic in my dhall structuring itself (keep the dhall "pure").
While I think the simple flag is the best bang-for-your-buck solution now, it could be interesting to provide a more generic serialization through a separate config: dhall-to-json --file my-pure.dhall --output-format how-to-serialize.dhall
.
@sjakobi: I see that you have assigned this to yourself. Are you still working on this? If not, I can try my hand at it.
I have found another reason why I’m not super into the in-band representation: it changes the semantics of the dhall code itself, e.g. the Nesting
enum is burned for other use (although accidental conversion might seem unlikely).
compare to how setting --no-maps
vs not setting it changes the interpretation, but out of band.
I'm trying to generate some json that looks like this:
It seems natural to model this with unions:
However, this does not include the type discriminator field:
How would I achieve this?