Closed maxsnew closed 9 years ago
Totally support this and I've already given it some thought, so just some comments:
First, in https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs, the code path currently only knows that it's dealing with a Foo, but it doesn't know that Foo is made up of Bars and Bazs. This "ADT shape information" isn't available to this part of the program (at least it wasn't obvious to me where that would be found), and it will have to be made available. I defer to Evan on the actual implementation challenges of wiring through this information.
Personally, I don't like the fact the null equates with Maybe. I would remove that compatibility in favor of arbitrary ADT serialization.
As far as the serialization scheme, I noticed that you essentially chose the native representation of Elm types. I agree that this is a nice way to do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it leads to an implementation that is both simple and fast. However, I would drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where _params is a list of arguments, as this is compatible with existing code in Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum types with product types, and our basic product types (tuples) are already implemented as lists, so that's a nice property.
On Wed, Feb 5, 2014 at 6:51 PM, Max S. New notifications@github.com wrote:
This seems pretty simple to implement:
data Foo a = Bar String | Baz (Foo a) Intx = Bar "bleh"y = Baz x 7
then x and y could be serialized to the following javascript (field names could easily be something else):
var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7 }
The only problem I see with this is that the serialization of Maybe would be inconsistent with this.
Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation
type Either' a b = (Maybe a, Maybe b)
where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.
Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490 .
My overall feeling on this are that, because elm is a strict language, all non-lambda types should have default serialization built in, and thus could all be used in ports.
One small problem I see in your example: Without type annotations, the type variable a, in both the type of x and the type of y, is indeterminate. I point this out mainly to say that, if there are free variables in a type, then that type should not be serializable.
Oh woops, I meant to use the a
.
Given
data Foo a = Bar a
| Baz (Foo a) Int
This should work:
port out : Signal (Foo String)
port out = constant <| Bar "Work!"
producing
{ _ctor: "Bar", _0: "Work!" }
but this should obviously be an error:
port out : Signal ([a] -> a)
port out = contant head
This is already handled to some extent since Maybe/List are serializable.
I'd be more inclined to use the _0
, _1
, etc style. I currently think it is easier to use (and should generally have better performance). Either way, I think the leading underscore for _ctor
and _params
is not adding any value, and I believe it is ctor
in the compiler now?
Let's do some examples to see on how it looks:
switch(value.ctor) { ... }
switch(value.constructor) { ... } // collides with built-in JS name?
value._0
// I feel like this is not a thing that is named, but
value.params[0]
value.values[0]
// or maybe
value.v0
value.p0
value.a0
value[0] // you can use numbers as field names! :O
No idea!
As it is, you don't need to have a full list of constructors to do any of this. Although, it could be very valuable to have such a list because generated JS could switch on integers instead of strings, which should be significantly faster:
switch (x.ctor) {
case "Just":
case "Nothing":
}
// possibly faster version
switch (x.ctor) {
case 0:
case 1:
}
I think this kind of optimization is not a good idea right now. No one will notice speed gains in this realm, and it'll make things harder to maintain and make generated code harder to read.
Why not just use an array, using the first element for ctor? ["Baz", x, 7] On Feb 6, 2014 2:35 AM, "John P Mayer, Jr" notifications@github.com wrote:
Totally support this and I've already given it some thought, so just some comments:
First, in
https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs , the code path currently only knows that it's dealing with a Foo, but it doesn't know that Foo is made up of Bars and Bazs. This "ADT shape information" isn't available to this part of the program (at least it wasn't obvious to me where that would be found), and it will have to be made available. I defer to Evan on the actual implementation challenges of wiring through this information.
Personally, I don't like the fact the null equates with Maybe. I would remove that compatibility in favor of arbitrary ADT serialization.
As far as the serialization scheme, I noticed that you essentially chose the native representation of Elm types. I agree that this is a nice way to do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it leads to an implementation that is both simple and fast. However, I would drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where _params is a list of arguments, as this is compatible with existing code in Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum types with product types, and our basic product types (tuples) are already implemented as lists, so that's a nice property.
On Wed, Feb 5, 2014 at 6:51 PM, Max S. New notifications@github.com wrote:
This seems pretty simple to implement:
data Foo a = Bar String | Baz (Foo a) Intx = Bar "bleh"y = Baz x 7
then x and y could be serialized to the following javascript (field names could easily be something else):
var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7 }
The only problem I see with this is that the serialization of Maybe would be inconsistent with this.
Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation
type Either' a b = (Maybe a, Maybe b)
where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.
Reply to this email directly or view it on GitHub< https://github.com/evancz/Elm/issues/490> .
Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34283540 .
@JohnNilsson, that's actually how things were represented early on in the generated code :) I think it was slower, but I don't think that's the point of this API.
switch(foo[0]) {
case "Baz":
return foo[1] + foo[2];
case "Bar":
}
Any opinions on the looks of this? (with your JS cap on)
Maybe we should have a more realistic example and do a bunch of gists?
Here is the code for the program I was writing: https://gist.github.com/maxsnew/8855176 .
I'm really just hoping for ADT compatibility with Data.Aeson.TH, which automatically generates JSON serialization in Haskell. I know this isn't a priority for everybody, but I feel like it's certainly something to consider, at least as prior art.
Let's say we have:
data Foo = Bar Int String x = Bar 5 "baz"
Elm internals currently look like { "ctor" : "Bar", "_0" : 5, "_1" : "baz" }
Data.Aeson.TH has three modes for ADTs.
TaggedObject looks like { "tag" : "Bar", "contents" : [ 5, "baz" ] }
ObjectWithSingleField looks like { "Bar" : [ 5, "baz" ] }
TwoElemArray looks liks [ "Bar", [ 5, "baz" ] ]
On Thu, Feb 6, 2014 at 7:15 PM, Max S. New notifications@github.com wrote:
Here is the code for the program I was writing: https://gist.github.com/maxsnew/8855176 .
Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34391435 .
Cool, @maxsnew that examples helps me :) I like the names "tag" and "contents" a lot! That is much more intuitive for me when I think of them as JS data structures.
I think the ObjectWithSingleField always seems like a cool and great idea until you actually try to use it in JS and find that it's a decent pain in the ass (this is the key problem with the type format in generated documentation).
So I would currently lean towards the TaggedObject style.
As for the issue of "should Maybe Int
represent the set of all integers and null
or be serialized to this format?"
It may make sense to have a type alias type Nullable a = Maybe a
that has the different behavior. I'm not sure though. I think this is a particular case where it makes sense to have a special case. Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.
Yes I agree that TaggedObject is the nicest pattern. I don't think there would be any significant performance hit.
I've backtracked on my opinion of Maybe; I think it does make sense to have something to handle nulls in JS. Maybe is the natural candidate, and that's how Aeson handles null as well.
On Sat, Feb 8, 2014 at 10:19 AM, Evan Czaplicki notifications@github.comwrote:
As for the issue of "should Maybe Int represent the set of all integers and null or be serialized to this format?"
It may make sense to have a type alias type Nullable a = Maybe a that has the different behavior. I'm not sure though. I think this is a particular case where it makes sense to have a special case. Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.
Reply to this email directly or view it on GitHubhttps://github.com/evancz/Elm/issues/490#issuecomment-34546381 .
Okay, so the main goal of this issue is: Get conversion with the TaggedObject style.
The subgoal is: Optimize case expressions to use extra information to generate switches on integers.
I think there is a hack to do this :) When you generate constructors for an ADT, you also generate some helper structures in JS.
data Either a b = Left a | Right b
would generate the normal Left
and Right
constructor, but also something like this:
var Either = { Left : 0, Right : 1, schemas : [ ... ] };
And from there you use Either.Left
and Either.Right
to do switches on integers while maintaining readability in the source code. I don't think that's an important issue, but some people do. We just have to hope that inlining integers is faster than comparing strings in switches. You can also use the schemas to convert from JS to Elm.
Hi,
I was wondering what the status of this feature is. I made a post on the mailing list that is related to this. I can potentially contribute towards this if it would help accelerate this feature.
By the way, I've benchmarked switching on strings vs. inlining integers, and surprisingly strings seems to do better. I ran this in chrome 35 and strings are twice as fast:
Average string time: 0.0010951002826914192 Average constructor time: 0.002199900755658746
var benchmark = function ( f) {
var t0 = performance.now(); //new Date().getTime();
f();
var tf = performance.now(); //new Date().getTime();
// console.log( tf);
// console.log( t0);
return tf - t0;
}
var match_string = function( str) {
return ( function() {
switch ( str) {
case "ConstructorA":
break;
case "ConstructorB":
break;
case "ConstructorC":
break;
case "ConstructorD":
break;
default:
break;
};
});
}
var ADT = { ConstructorA: 0, ConstructorB: 1, ConstructorC: 2, ConstructorD: 3}
var match_constructor = function( cstor) {
return ( function() {
switch ( cstor) {
case ADT.ConstructorA:
break;
case ADT.ConstructorB:
break;
case ADT.ConstructorC:
break;
case ADT.ConstructorD:
break;
default:
break;
}
});
}
var sum_string = 0.0;
var sum_cstor = 0.0;
var n = 10000;
for ( var i = 0; i < n; i++) {
var rand = Math.floor(Math.random() * 4);
var rand_string;
var rand_cstor;
switch ( rand) {
case 0:
rand_string = "ConstructorA";
rand_cstor = ADT.ConstructorA;
break;
case 1:
rand_string = "ConstructorB";
rand_cstor = ADT.ConstructorB;
break;
case 2:
rand_string = "ConstructorC";
rand_cstor = ADT.ConstructorC;
break;
case 3:
rand_string = "ConstructorD";
rand_cstor = ADT.ConstructorD;
break;
}
sum_string += benchmark( match_string( rand_string));
sum_cstor += benchmark( match_constructor( rand_cstor));
}
console.log( "Average string time: " + (sum_string / n));
console.log( "Average constructor time: " + (sum_cstor / n));
Just talked with @michaelbjames about working on this feature. These benchmarks appear to confirm your results, which is very confusing to me. Perhaps people switch on strings more often? In any case, let's not do that part of this then.
You are welcome to take a look, but keep @michaelbjames up to date on what you find. I think it's a trickier to implement than it sounds.
I remember making a jsperf test for this int vs string switching a long time ago, and I remember being surprised that strings were faster. It's apparently been this way for quite a while.
Ok, I'll try to get in contact with @michaelbjames about working on this.
+1 What is the status of this? I would advise first implementing, then optimizing it. :)
As of now, a workaround is to pass Json.Values through ports and Json.Decode/Encode them.
I'm a bit late to the party here, but has anyone considered exposing (and mandating the use of) explicit constructor functions? For example:
Elm.Just(42)
Elm.Nothing()
I see a few benefits to this:
Thoughts?
I pitched an idea to Evan last week which boils down to registering destructors (in-port) and constructors (out-port) for arbitrary representations of variant types. We agreed that I'd write it up here to get some feedback.
The main motivation is to arrive at a unified approach that subsumes currently port-compatible ADTs while allowing arbitrary data representations outside of Elm. This picks up on @rtfeldman's proposal.
I very much tried to think with my JS cap on: How might I want to deal with this data in a JS codebase? (Hence trying to avoid if (foo[0] === 'Bar') return foo[1];
.)
Credo:
Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.
Benefits:
Maybe
fits into this approach naturallyData.Aeson.TH
if you wantI'll reuse the example from above:
data Foo a = Bar a
| Baz (Foo a) Int
Let's assume that in my JS I represent those as (assuming Foo String
)
{ kind: 'Bar', size: '40sqm' }
{ kind: 'Baz', bar: { ... }, rooms: 5 }
Other encodings abound, e.g. OOP style.
port : Signal (Foo String)
The basic idea for conversion at an in-port with type T
is to register a function that knows how to transform an incoming value when given access to all of the constructors of T
. The trick is to only accept conversion results when they are constructed with our constructors. This allows us to assure that ultimately incoming values are of the right form.
So in this case, for whatever outside representation OuterFoo String
of Foo String
we need a function
OuterFoo String
-> (String -> Foo String) -- Bar
-> (Foo String -> Int -> Foo String) -- Baz
-> Foo String
for example suppose the following function is registered for Foo
function fooIn(val, Bar, Baz) {
switch (val.tag) {
case 'Bar': return Bar(val.size);
case 'Baz': return Baz(val.bar, val.rooms);
}
}
Bar
and Baz
are not simple constructors, but are guarded -- this is the border policing:
What does Bar
do?
val.size
is a string, construct a Foo String
What does Baz
do?
Foo
, i.e. fooIn
fooIn(val.bar)
returns and val.rooms
is a number, construct a Foo String
What happens with the return value of fooIn
?
Bar
or Baz
, throw if notNote that this construction process is recursive, that is, nested ADTs can be sent through ports.
The idea is similar but now the converter is called with a tag (probably a string?) and the data, from which it constructs whichever representation desired.
A default converter (e.g. from and to TaggedObject) can be used when no converter is registered for a type. Converters can be registered globally for all types and/or per type.
Because ADTs may appear nested inside ADTs or records, we need to do one of two things:
The Elm runtime could make use of this approach to register a TaggedObject converter as default, while pre-registering special converters for builtin types such as Maybe
, List
, etc.
I have more details, but this is too long alreay. (Sorry!) What do you think?
I am trying to clean up the issues on these repos. My goal is for issues on repos like elm-compiler
and elm-repl
are about specific bugs. It is not working to manage proposals and stuff that requires changes to the language here, so I am devising a way of keeping track of all this stuff in a transparent way.
The idea is that it doesn't super matter where the proposal is opened, but we should collect related ideas in certain places so we can look at them as a collection of issues and find a coherent solution that addresses them all as much as possible. So I'm gonna close this and reference https://github.com/elm-lang/elm-plans/issues/17 which is collecting a bunch of port problems.
If there is more to say about this idea, we should continue on this thread. The fact that is closed is more about decluttering stuff!
Hello,
What is the status of this? https://github.com/elm-lang/elm-plans/issues/17 is still open with no status, but the repo is "deprecated"...
Thanks, -- Chad
@evancz what is the progress on this
This seems pretty simple to implement:
then
x
andy
could be serialized to the following javascript (field names could easily be something else):The only problem I see with this is that the serialization of
Maybe
would be inconsistent with this.Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the
Either
type. Currently I'm doing the following to get around this limitationwhere I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.