elm / compiler

Compiler for Elm, a functional language for reliable webapps.
https://elm-lang.org/
BSD 3-Clause "New" or "Revised" License
7.51k stars 660 forks source link

Feature Request: Arbitrary (First-order) Datatypes in Ports #490

Closed maxsnew closed 9 years ago

maxsnew commented 10 years ago

This seems pretty simple to implement:

data Foo a = Bar String
           | Baz (Foo a) Int
x = Bar "bleh"
y = Baz x 7

then x and y could be serialized to the following javascript (field names could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }
var y = { _ctor: "Baz", _0: x, _1: 7 }

The only problem I see with this is that the serialization of Maybe would be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.

johnpmayer commented 10 years ago

Totally support this and I've already given it some thought, so just some comments:

First, in https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs, the code path currently only knows that it's dealing with a Foo, but it doesn't know that Foo is made up of Bars and Bazs. This "ADT shape information" isn't available to this part of the program (at least it wasn't obvious to me where that would be found), and it will have to be made available. I defer to Evan on the actual implementation challenges of wiring through this information.

Personally, I don't like the fact the null equates with Maybe. I would remove that compatibility in favor of arbitrary ADT serialization.

As far as the serialization scheme, I noticed that you essentially chose the native representation of Elm types. I agree that this is a nice way to do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it leads to an implementation that is both simple and fast. However, I would drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where _params is a list of arguments, as this is compatible with existing code in Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum types with product types, and our basic product types (tuples) are already implemented as lists, so that's a nice property.

On Wed, Feb 5, 2014 at 6:51 PM, Max S. New notifications@github.com wrote:

This seems pretty simple to implement:

data Foo a = Bar String | Baz (Foo a) Intx = Bar "bleh"y = Baz x 7

then x and y could be serialized to the following javascript (field names could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7 }

The only problem I see with this is that the serialization of Maybe would be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.

Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490 .

johnpmayer commented 10 years ago

My overall feeling on this are that, because elm is a strict language, all non-lambda types should have default serialization built in, and thus could all be used in ports.

johnpmayer commented 10 years ago

One small problem I see in your example: Without type annotations, the type variable a, in both the type of x and the type of y, is indeterminate. I point this out mainly to say that, if there are free variables in a type, then that type should not be serializable.

maxsnew commented 10 years ago

Oh woops, I meant to use the a. Given

data Foo a = Bar a
           | Baz (Foo a) Int

This should work:

port out : Signal (Foo String)
port out = constant <| Bar "Work!"

producing

{ _ctor: "Bar", _0: "Work!" }

but this should obviously be an error:

port out : Signal ([a] -> a)
port out = contant head

This is already handled to some extent since Maybe/List are serializable.

evancz commented 10 years ago

I'd be more inclined to use the _0, _1, etc style. I currently think it is easier to use (and should generally have better performance). Either way, I think the leading underscore for _ctor and _params is not adding any value, and I believe it is ctor in the compiler now?

Let's do some examples to see on how it looks:

switch(value.ctor) { ... }
switch(value.constructor) { ... }  // collides with built-in JS name?

value._0

// I feel like this is not a thing that is named, but
value.params[0]
value.values[0]

// or maybe
value.v0
value.p0
value.a0
value[0] // you can use numbers as field names! :O

No idea!

As it is, you don't need to have a full list of constructors to do any of this. Although, it could be very valuable to have such a list because generated JS could switch on integers instead of strings, which should be significantly faster:

switch (x.ctor) {
case "Just":
case "Nothing":
}

// possibly faster version
switch (x.ctor) {
case 0:
case 1:
}

I think this kind of optimization is not a good idea right now. No one will notice speed gains in this realm, and it'll make things harder to maintain and make generated code harder to read.

JohnNilsson commented 10 years ago

Why not just use an array, using the first element for ctor? ["Baz", x, 7] On Feb 6, 2014 2:35 AM, "John P Mayer, Jr" notifications@github.com wrote:

Totally support this and I've already given it some thought, so just some comments:

First, in

https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs , the code path currently only knows that it's dealing with a Foo, but it doesn't know that Foo is made up of Bars and Bazs. This "ADT shape information" isn't available to this part of the program (at least it wasn't obvious to me where that would be found), and it will have to be made available. I defer to Evan on the actual implementation challenges of wiring through this information.

Personally, I don't like the fact the null equates with Maybe. I would remove that compatibility in favor of arbitrary ADT serialization.

As far as the serialization scheme, I noticed that you essentially chose the native representation of Elm types. I agree that this is a nice way to do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it leads to an implementation that is both simple and fast. However, I would drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where _params is a list of arguments, as this is compatible with existing code in Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum types with product types, and our basic product types (tuples) are already implemented as lists, so that's a nice property.

On Wed, Feb 5, 2014 at 6:51 PM, Max S. New notifications@github.com wrote:

This seems pretty simple to implement:

data Foo a = Bar String | Baz (Foo a) Intx = Bar "bleh"y = Baz x 7

then x and y could be serialized to the following javascript (field names could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7 }

The only problem I see with this is that the serialization of Maybe would be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.

Reply to this email directly or view it on GitHub< https://github.com/evancz/Elm/issues/490> .

Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34283540 .

evancz commented 10 years ago

@JohnNilsson, that's actually how things were represented early on in the generated code :) I think it was slower, but I don't think that's the point of this API.

switch(foo[0]) {
case "Baz":
  return foo[1] + foo[2];
case "Bar":
}

Any opinions on the looks of this? (with your JS cap on)

evancz commented 10 years ago

Maybe we should have a more realistic example and do a bunch of gists?

maxsnew commented 10 years ago

Here is the code for the program I was writing: https://gist.github.com/maxsnew/8855176 .

johnpmayer commented 10 years ago

I'm really just hoping for ADT compatibility with Data.Aeson.TH, which automatically generates JSON serialization in Haskell. I know this isn't a priority for everybody, but I feel like it's certainly something to consider, at least as prior art.

Let's say we have:

data Foo = Bar Int String x = Bar 5 "baz"

Elm internals currently look like { "ctor" : "Bar", "_0" : 5, "_1" : "baz" }

Data.Aeson.TH has three modes for ADTs.

TaggedObject looks like { "tag" : "Bar", "contents" : [ 5, "baz" ] }

ObjectWithSingleField looks like { "Bar" : [ 5, "baz" ] }

TwoElemArray looks liks [ "Bar", [ 5, "baz" ] ]

On Thu, Feb 6, 2014 at 7:15 PM, Max S. New notifications@github.com wrote:

Here is the code for the program I was writing: https://gist.github.com/maxsnew/8855176 .

Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34391435 .

evancz commented 10 years ago

Cool, @maxsnew that examples helps me :) I like the names "tag" and "contents" a lot! That is much more intuitive for me when I think of them as JS data structures.

I think the ObjectWithSingleField always seems like a cool and great idea until you actually try to use it in JS and find that it's a decent pain in the ass (this is the key problem with the type format in generated documentation).

So I would currently lean towards the TaggedObject style.

evancz commented 10 years ago

As for the issue of "should Maybe Int represent the set of all integers and null or be serialized to this format?"

It may make sense to have a type alias type Nullable a = Maybe a that has the different behavior. I'm not sure though. I think this is a particular case where it makes sense to have a special case. Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.

johnpmayer commented 10 years ago

Yes I agree that TaggedObject is the nicest pattern. I don't think there would be any significant performance hit.

I've backtracked on my opinion of Maybe; I think it does make sense to have something to handle nulls in JS. Maybe is the natural candidate, and that's how Aeson handles null as well.

On Sat, Feb 8, 2014 at 10:19 AM, Evan Czaplicki notifications@github.comwrote:

As for the issue of "should Maybe Int represent the set of all integers and null or be serialized to this format?"

It may make sense to have a type alias type Nullable a = Maybe a that has the different behavior. I'm not sure though. I think this is a particular case where it makes sense to have a special case. Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.

Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34546381 .

evancz commented 10 years ago

Okay, so the main goal of this issue is: Get conversion with the TaggedObject style.

The subgoal is: Optimize case expressions to use extra information to generate switches on integers.

I think there is a hack to do this :) When you generate constructors for an ADT, you also generate some helper structures in JS.

data Either a b = Left a | Right b

would generate the normal Left and Right constructor, but also something like this:

var Either = { Left : 0, Right : 1, schemas : [ ... ] };

And from there you use Either.Left and Either.Right to do switches on integers while maintaining readability in the source code. I don't think that's an important issue, but some people do. We just have to hope that inlining integers is faster than comparing strings in switches. You can also use the schemas to convert from JS to Elm.

jprider63 commented 10 years ago

Hi,

I was wondering what the status of this feature is. I made a post on the mailing list that is related to this. I can potentially contribute towards this if it would help accelerate this feature.

By the way, I've benchmarked switching on strings vs. inlining integers, and surprisingly strings seems to do better. I ran this in chrome 35 and strings are twice as fast:

Average string time: 0.0010951002826914192 Average constructor time: 0.002199900755658746

var benchmark = function ( f) {
        var t0 = performance.now(); //new Date().getTime();
        f();
        var tf = performance.now(); //new Date().getTime();

        // console.log( tf);
        // console.log( t0);

        return tf - t0;
}

var match_string = function( str) {
        return ( function() {
                switch ( str) {
                        case "ConstructorA":
                                break;
                        case "ConstructorB":
                                break;
                        case "ConstructorC":
                                break;
                        case "ConstructorD":
                                break;
                        default:
                                break;
                };
        });
}

var ADT = { ConstructorA: 0, ConstructorB: 1, ConstructorC: 2, ConstructorD: 3}
var match_constructor = function( cstor) {
        return ( function() {
                switch ( cstor) {
                        case ADT.ConstructorA:
                                break;
                        case ADT.ConstructorB:
                                break;
                        case ADT.ConstructorC:
                                break;
                        case ADT.ConstructorD:
                                break;
                        default:
                                break;
                }
        });
}

var sum_string = 0.0;
var sum_cstor = 0.0;

var n = 10000;

for ( var i = 0; i < n; i++) {
        var rand = Math.floor(Math.random() * 4);
        var rand_string;
        var rand_cstor;
        switch ( rand) {
                case 0:
                        rand_string = "ConstructorA";
                        rand_cstor = ADT.ConstructorA;
                        break;
                case 1:
                        rand_string = "ConstructorB";
                        rand_cstor = ADT.ConstructorB;
                        break;
                case 2:
                        rand_string = "ConstructorC";
                        rand_cstor = ADT.ConstructorC;
                        break;
                case 3:
                        rand_string = "ConstructorD";
                        rand_cstor = ADT.ConstructorD;
                        break;
        }

        sum_string += benchmark( match_string( rand_string));
        sum_cstor += benchmark( match_constructor( rand_cstor));
}

console.log( "Average string time: " + (sum_string / n));
console.log( "Average constructor time: " + (sum_cstor / n));
evancz commented 10 years ago

Just talked with @michaelbjames about working on this feature. These benchmarks appear to confirm your results, which is very confusing to me. Perhaps people switch on strings more often? In any case, let's not do that part of this then.

You are welcome to take a look, but keep @michaelbjames up to date on what you find. I think it's a trickier to implement than it sounds.

deadfoxygrandpa commented 10 years ago

I remember making a jsperf test for this int vs string switching a long time ago, and I remember being surprised that strings were faster. It's apparently been this way for quite a while.

jprider63 commented 10 years ago

Ok, I'll try to get in contact with @michaelbjames about working on this.

thSoft commented 9 years ago

+1 What is the status of this? I would advise first implementing, then optimizing it. :)

thSoft commented 9 years ago

As of now, a workaround is to pass Json.Values through ports and Json.Decode/Encode them.

rtfeldman commented 9 years ago

I'm a bit late to the party here, but has anyone considered exposing (and mandating the use of) explicit constructor functions? For example:

Elm.Just(42) Elm.Nothing()

I see a few benefits to this:

  1. The intermediate representation is no longer exposed, so it can change between versions without breaking existing code.
  2. You can no longer accidentally use this API for dirty hacks; all the exposed constructors only permit doing things "the right way," so you'd have to deliberately avoid using them in order to do things the wrong way.
  3. This looks more like the equivalent Elm code.

Thoughts?

knuton commented 9 years ago

I pitched an idea to Evan last week which boils down to registering destructors (in-port) and constructors (out-port) for arbitrary representations of variant types. We agreed that I'd write it up here to get some feedback.

The main motivation is to arrive at a unified approach that subsumes currently port-compatible ADTs while allowing arbitrary data representations outside of Elm. This picks up on @rtfeldman's proposal.

I very much tried to think with my JS cap on: How might I want to deal with this data in a JS codebase? (Hence trying to avoid if (foo[0] === 'Bar') return foo[1];.)

Credo:

Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.

Benefits:

I'll reuse the example from above:

data Foo a = Bar a
           | Baz (Foo a) Int

Let's assume that in my JS I represent those as (assuming Foo String)

{ kind: 'Bar', size: '40sqm' }
{ kind: 'Baz', bar: { ... }, rooms: 5 }

Other encodings abound, e.g. OOP style.

Destructing Outside Representations (in-ports)

port : Signal (Foo String)

The basic idea for conversion at an in-port with type T is to register a function that knows how to transform an incoming value when given access to all of the constructors of T. The trick is to only accept conversion results when they are constructed with our constructors. This allows us to assure that ultimately incoming values are of the right form.

So in this case, for whatever outside representation OuterFoo String of Foo String we need a function

   OuterFoo String
-> (String -> Foo String) -- Bar
-> (Foo String -> Int -> Foo String) -- Baz
-> Foo String

for example suppose the following function is registered for Foo

function fooIn(val, Bar, Baz) {
  switch (val.tag) {
    case 'Bar': return Bar(val.size);
    case 'Baz': return Baz(val.bar, val.rooms);
  }
}

Bar and Baz are not simple constructors, but are guarded -- this is the border policing:

What does Bar do?

What does Baz do?

What happens with the return value of fooIn?

Note that this construction process is recursive, that is, nested ADTs can be sent through ports.

Constructing Outside Representations (out-ports)

The idea is similar but now the converter is called with a tag (probably a string?) and the data, from which it constructs whichever representation desired.

Converter Registration

A default converter (e.g. from and to TaggedObject) can be used when no converter is registered for a type. Converters can be registered globally for all types and/or per type.

Because ADTs may appear nested inside ADTs or records, we need to do one of two things:

Builtins

The Elm runtime could make use of this approach to register a TaggedObject converter as default, while pre-registering special converters for builtin types such as Maybe, List, etc.


I have more details, but this is too long alreay. (Sorry!) What do you think?

evancz commented 9 years ago

I am trying to clean up the issues on these repos. My goal is for issues on repos like elm-compiler and elm-repl are about specific bugs. It is not working to manage proposals and stuff that requires changes to the language here, so I am devising a way of keeping track of all this stuff in a transparent way.

The idea is that it doesn't super matter where the proposal is opened, but we should collect related ideas in certain places so we can look at them as a collection of issues and find a coherent solution that addresses them all as much as possible. So I'm gonna close this and reference https://github.com/elm-lang/elm-plans/issues/17 which is collecting a bunch of port problems.

If there is more to say about this idea, we should continue on this thread. The fact that is closed is more about decluttering stuff!

thewoolleyman commented 8 years ago

Hello,

What is the status of this? https://github.com/elm-lang/elm-plans/issues/17 is still open with no status, but the repo is "deprecated"...

Thanks, -- Chad

dullbananas commented 4 years ago

@evancz what is the progress on this