Data transformers - Githubissues

schani commented 6 years ago

There should be a portable way of specifying transformations on quicktype data, and quicktype should generate code for them in all supported languages. A wide range of transformations are possible - see jq for some inspiration. We'll have to start small, if only because supporting the target languages will be a lot of work.

As a first step I suggest modifying individual properties by applying a simple function to them. In many cases that will also modify their types.

Things that would be easy to solve with this:

Default values for optionals, of which #41 is a special case.
Parsing strings into non-JSON types. Special cases are #138 and #221.

For the time being the functions will be pre-defined, but we'll have to eventually allow users to implement their own functions in their target languages.

schani commented 6 years ago

Actually, we could even handle date/time via this, which would make implementing it on other languages much less work.

schani commented 6 years ago

We can represent this in the IR with two new entities. One is a new kind of type, a TransformedType, the other one describes the semantics of the transformation, and is called a Transformation.

Transformers are defined recursively. A transformer can be either

a built-in transformer function, such as StringToDate, potentially taking arguments such as a date format.
for compound types, a set of transformers, one for for each child of the transformed type.

Each transformed type has

the JSON type that's transformed.
the type that it's transformed to.
a transformer

We need to take care to make transformers bidirectional, so that data can be serialized as well as deserialized.

Note that the same transformations can be expressed in multiple ways. For example, let's say we have

class Foo {
    date: String -> Date (StringToDate)
}

The meaning here is that the type of Foo.date is a TransformedType which is transformed from string to date via the StringToDate transformer function. This is semantically equivalent to

class {
    date: String
}
  ->
class Foo {
    date: Date
}
    ( { date: StringToDate} )

Here the outer type is a TransformedType that transforms from a class containing a string member date to class Foo with date member date with a transformer that applies the StringToDate transformer function to the date property.

This is actually helpful, since some programming languages might not support all different ways of expressing a transformation. Swift, for example, doesn't seem to allow attaching a custom decoder to a single class property, like the first example demands. Instead, we would have to implement a custom decoder for the whole class, which corresponds to the second example. We can let the IR pick the correct way of expressing the transformation so the renderers don't have to do any heavy lifting.

Once we get into decoding unions we will have to be able to express more complex transformer arguments, which will include decision trees and IR types.

schani commented 6 years ago

Come to think of it, maybe a transformer should always be a transformer function, and we'll have transformer functions that do the job of transforming compound types by transforming their elements/properties. That means we'll have to be able to pass more complex arguments to transformer functions right away, including other transformers. The second example above then would be something like

class {
    date: String
}
  ->
class Foo {
    date: Date
}
    (TransformClass({ date: StringToDate}))

meaning that we transform the class type with the transformer function TransformClass, with the argument { date: StringToDate }, which tells it to apply the transformer StringToDate to the property date.

schani commented 6 years ago

Another thing transformers can be useful for: Let's say we add tuples to our IR (#48), but some target languages (Go comes to mind) don't support tuples. We'd also like to bootstrap them quickly, so a way of supporting them without adapting all the renderers right away would be nice.

We start out with this tuple in the IR:

[string, number, boolean]

We can change this into this transformed type:

Array<string | number | boolean>
  ->
class {
    first: string;
    second: number;
    third: boolean;
}
    (TransformTuple([first, second, third]))

Of course we still have to implement the transformer for the target language. Maybe we will find a way to make composable transformers so this becomes easier. Maybe something like this:

MakeClass({
    first: IndexArray(0) >> CastUnion,
    second: IndexArray(1) >> CastUnion,
    third: IndexArray(2) >> CastUnion
})

schani commented 6 years ago

Come to think of it some more, TransformClass from above is just a special case of MakeClass if we also add a GetProperty transformer:

MakeClass({
    date: GetProperty(date) >> StringToDate
})

schani commented 6 years ago

Issues with generating unions and intersections with primitive transformed types

There are three non-trivial cases to distinguish when forming unions and intersections in the presence of transformed types:

There's a type T in the union/intersection as well as a transformed type whose source type is T. Example: string and stringified number.
There is more than one transformed type with the same source type T present in the union/intersection. Example: stringified number and stringified bool.
There is a type T in the union/intersection as well as more than one transformed type whose source type is T. Example: string, stringified number, and stringified bool.

Unions

The easy way out is to always go down to type T. The hard way requires having either failable transformers or predicates for when transformers will apply. In that case the generated code will first have to try the transformers, and if none of them can apply, use the base type (or fail, if the base type isn't in the union). We should have a defined order in which the transformers will be tried.

Intersections

For intersections, we can disregard the base type T in all cases, so cases 2 and 3 are the same. Case 1 is trivial, since only the transformed type is left, so that's what the resulting type is. The other cases, where there is more than one transformed type, is a question of how hard we want to work. In some instances we'll have to hard-code the intersection type. For example, any intersection between date, time, and date-time is always date-time.

schani commented 6 years ago

Having started on the implementation I now believe that it's easier to not introduce a new type, but to make transformations attachable to types. The type is the source of the transformation, and the transformation specifies the target type. Most of the IR machinery can be adapted to this much easier, I think.

For the union of multiple transformations, I think this can be nicely solved by having a transformation that combines multiple transformations in sequence, trying them one after the other. We might even want to bring enums into this, so that we can handle them at the same level as date-time, for example. One might want to have a union of date-time and the string"never", for example. The target type of such a transformation is the union of the target types of the individual transformations, of course.

schani commented 6 years ago

Not all transformations will be bidirectional. For example, default-value maps from T | null to T. If you serialize again and encounter a T with the default value, do you leave it be or transform it to null? In this case you'll probably always choose to leave it be, but other cases might be less clear.

schani commented 6 years ago

Bidirectionality and computed properties

We can rescue bidirectionality by making the properties where we would lose it computed properties, as opposed to transforming the values at serialization time. As an example, let's say we have this type:

class Foo {
    bar: string = "Bear"
}

The semantics being that if bar is missing, or null or whatever, it defaults to "Bear". As opposed to generating this C# class:

class Foo {
    [JsonProperty("bar", Default = "Bear") or whatever]
    public string Bar { get; set; }
}

We could generate

class Foo {
    [JsonProperty("bar", AllowNull)]
    private string InternalBar { get; set; }

    public string Bar {
        get {
            return InternalBar == null ? "Bear" : InternalBar;
        }
        set {
            InternalBar = value;
        }
    }
}

Downsides:

Computed properties might be awkward in some languages.
They are slower, unless we retain the results, in which case they use more memory.
They do need the transformed type to be a property in most languages. As in, if the top-level type is a string, you can't make it into a computed property because it's not contained in anything.

Computed properties might be useful for purposes other than preserving bidirectionality.

schani commented 6 years ago

This is implemented in #845

glideapps / quicktype

Data transformers #466

Issues with generating unions and intersections with primitive transformed types

Unions

Intersections

Bidirectionality and computed properties