crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

Array/Tuple decomposition #132

Open jhass opened 10 years ago

jhass commented 10 years ago

I understand that implementing this is complex and has many implications, this issue is mainly intended to start discussion and, if wanted to track it as a long term goal.

Crystal already supports multiple assignment. It would be great to expand this to array/tuple decomposition.

a, b = array[0], array[1]
v0, v1, ..., vn = array[0], array[1], ..., array[n]
array = [1, 2]; a, b = array[0], array[1]
# should become
a, b = array
v0, v1, ..., vn = array
a, b = [1, 2]

a, b = array.[0], array[1..-1]
v0, v1, ..., vn = array[0], array[1], ..., array[n..-1]
v0, v1, ..., vi, vj, ..., vn = array[0], array[1], ..., array[i..(n-j)], ..., array[n]
# should become
a, *b = array
v0, v1, ..., *vn = array
v0, v1, ..., *vi, vj, ..., vn = array

This is the most useful part, effectively obtaining the list head and tail without modifying it. At least Erlang too provides a construct for this, but I think most functional languages do. The next level would be decomposing a nested structure:

(b, c), (d, e) = [[1, 2], [3, 4]]
a, (b, c), d = [1, [2, 3], 4]
a, (b, (c, *d)), *e, f = [1, [2, [3, 4, 5]], 6, 7, 8] # a = 1, b = 2, c = 3, d = [4, 5], e = [6, 7], f = 8

Ruby allows this for arbitrary objects that respond to #to_ary, allowing similar behavior, maybe #to_t, could be another goal.

A last goal would be allowing this in method definitions and block, proc and lambda arguments.

Starting with a simple implementation, maybe only allowing it for tuples for example, would already add a benefit in my opinion.

asterite commented 10 years ago

Right now tuple/array unpacking is very simple:

# this
a, b, c = expression

# becomes this:
tmp = expression
a = tmp[0]
b = tmp[1]
c = tmp[2]

I mean it: the compiler just transforms the source code before applying any semantic analysis to it.

We could add support for *, like in Ruby:

# this
a, b, *c, d = expression

# becomes this
tmp = expression
a = tmp[0]
b = tmp[1]
c = tmp[2 .. -2]
d = tmp[-1]

Here we already have two small problems: first, this will work out of the box with arrays but not with tuples. For tuples we would need to understand negative indexes. Then, we also need to support ranges known at compile time (like 2 .. -2) that would return another tuple (I guess a range at runtime is unacceptable). Still, all of this can be implemented.

But I wonder how useful is this. I personally never used this in code I wrote.

The thing is, dynamic languages abuse arrays and tuples. For example this (one of your examples):

ary = [1, [2, 3], 4]
a, (b, c), d = ary

In Ruby it's just fine. What happens in Crystal? First, ary has type Array(Int32 | Array(Int32)). The above code would be transformed to this one:

ary = [1, [2, 3], 4]
tmp = ary
a = ary[0]
b = ary[1][0]
c = ary[1][1]
d = ary[2]

That, out of the box, doesn't compile, because ary[1] can either be an Int32 or an Array(Int32), and Int32 doesn't have a [] method. So this would have to be implemented in a totally different way: unpacking happens at runtime and if the object doesn't respond to [], at runtime, throw some exception.

Another thing: you talk about the head and tail of a list. In functional languages lists are represented as immutable linked lists, so getting the head and the tail is something very cheap and fast. In Ruby arrays can be shared, so when you slice them you get a shared view until you modify the copy (if I'm not wrong). That can also be fast and cheap. In Crystal getting the tail of a list is allocating memory for a new array and copying the data (I don't think we'll want to complicate things with shared arrays). Then, in functional languages that's the only thing you can do with a list: access the head and the tail. So that's why it's useful: because you have no other way.

I believe arrays of arrays can be common, but are mostly used to hack something quickly, or used in some benchmarks or short code to demonstrate how concise or cute a language is. This is all fine with dynamic languages, but in static languages it's harder.

About Erlang... sorry, I don't like Erlang at all, so here comes a rant.

In Erlang the only real types you have are lists and tuples. Yes, you have records, but they are just tuples in the end, so much that every time you have to deal with one you have to tell Erlang so (X#yes_im_this_record.foo). Then you have lists. And that's all. So, of course you will define all your data to be tuples of lists of tuples of lists of tuples. So, of course, to deal with these data structures they have pattern matching, where you can select nested elements with a "short" syntax.

Want to do an http call? Go ahead, we return you a tuple where the second element is a tuple whose first element is a tuple. Then you can pattern match that. Where do you get the error if you wrote something wrong? At runtime. How do you just get a specific element of that tuple if you don't care about the rest? You put ugly underscores. How someone reading your code understands what each element of that tuple is? You hope that the programmer put meaningful names to the variables, because otherwise you'll have to check the docs.

In languages where you have named types, you would have an HTTPResponse class with a method status, body, etc. No tuple unpacking is needed. The code is clearer. If you make a mistake you get a compile-time error (if it's compiled!). With named types I see very little use for tuples/array unpacking.

Tuples are good for returning multiple values at once. For example:

struct Number
  def divmod(number)
    {self / number, self % number}
  end
end

Then unpacking is good to unpack the return value without needing an intermediate variable:

div, mod = 10.divmod(3)

Unpacking is also good with, for example, String#split:

a, b = "foo,bar".split ","

But that's it. For me, more complicated uses deserve a named type because otherwise the code becomes really cryptic.

Here finishes the rant about Erlang. Note that I do like Erlang because of it's efficiency and philosophy. I just don't like the syntax and the very limited types you have. Elixir is an improvement, but I really don't like to just prefix every method with the type I'm working with (List.delete(list, item... do I really have to remind you, language, that my variable is a list everytime I want to do something with a list?).

All of the above is just my personal opinion. I know @waj likes Erlang, lists, tuples and pattern matching more than me. @bcardiff also likes functional languages and this kind of stuff a lot, so they might like more your point of view. We can leave this ticket open for further discussion... who knows what will happen in the end? :-)

jhass commented 10 years ago

In Ruby arrays can be shared, so when you slice them you get a shared view until you modify the copy.

That would be new to me, I thought the main reason for introducing Enumerator::Lazy1 would be reducing the amount of (large) arrays that are allocated.

Talking about Enumerator, more complex chains here are my main usage for advanced decomposition, that is, in block arguments:

hash.group_by {|k, v| v.foo }.each.with_index do |(k, v), i|
  ...
end

I very much like the functional powers Ruby provides here, they are very useful for small, data transforming scripts. If your data structures evolve in something too complex you then can just start to introduce entity classes and the like. This together allows evolving from fast prototypes to more complex applications that don't need to be designed upfront. So yes, I like those nested data structures and being able to decompose them is essential to work with them. Not letting them get too deep or wide is a question of the programmers discipline, not of the language, in my opinion.

Looking at the Enumerator example, supporting that advanced decomposition just for tuples (at first), would already provide a useful addition.

PS: I never really got the hang of Erlang either, it just happened to be another example I knew ;P

asterite commented 10 years ago

@MrZYX hey, I always wondered how to iterate a hash while having the index! That looks nice.

Yes, nested tuple unpacking is something that could work, because in each position you might have a different type and the compiler is aware of that (the same doesn't apply for arrays).

Right now Hash is not Enumerable. I guess the enumerated type would be Tuple(K, V), but then we would need to unpack tuples in block arguments, which is what @kostya also wants (and me too).

And3s commented 10 years ago

ruby: a, b = "a".split p b # nil

crystal: a, b = "a".split !! Index out of bounds

I hope b has a nil value.

waj commented 10 years ago

If a or b could have nil value that means the type of both would be nilable (String? in this case). That makes working with those variables a little more inconvenient. Normally when you use multiple assignment you know upfront that the right hand value can fill all the variables. So I'd prefer to have the exception instead of having nilable typed variables.

jhass commented 10 years ago

Yes, a, b = 'a b c'.split is rare, but I actually came quite often across the need for a, *b = 'a b c'.split in Ruby, with a, *b = 'a'.split being a valid case.

colinmarc commented 9 years ago

+1, but I'll qualify: I think tuple unpacking is very different from array unpacking. The former is way more intuitive, imo.

It seems perfectly reasonable to error on this:

[[1, 2], [3, 4]].each { |a, b| ... }

But this should work:

[{1, 2}, {3, 4}].each { |a, b| ... }

Or maybe even better, make it explicit:

[{1, 2}, {3, 4}].each { |(a, b)| ... }

I have a hunch that tuple unpacking would be a bit easier to implement, also.

aemadrid commented 8 years ago

I like being able to do this:

{ a: [1,2], b: [3,4] }.each{|k,(v1,v2)| puts "#{k} : <#{v1}:#{v2}>" }

In other words, unpacking an array or tuple in a block it's great. Using temp vars or full representations seems like worse options.

bew commented 6 years ago

Let's revive this a bit :smiley:

First of all, I think this will never have any reasonable & safe way to work for things other than a Tuple (and NamedTuple ? maybe laaater..). With that said, I think it's important to have a nice safe & powerful way decomposition for Tuples.

I'd personally would prefer to remove the a, b = foo syntax and replace it with a tuple-only decomposition syntax: {a, b} = foo with foo being a tuple (or a method returning a tuple).

The use-case I have in mind is that sometimes we only need the first return value of a function returning 2+ return values. We can't write first = foo, as this would get the tuple of return values not the first return value, but we could easily write {first} = foo, and get what we want.

Note that a current workaround is to do `first, = foowhich will get then 'discard' the second return value and properly get the first one intofirst`, but I always think of it as a workaround, not a solution._


About deep decomposition, the example from above already works for tuples:

tmp = {1, {2, 3}, 4}
a = tmp[0]
b = tmp[1][0]
c = tmp[1][1]
d = tmp[2]

So it would only need implementation (and no impossible-thinking about types being unknown about Array with union types, etc..) Using the above tuple decomposition syntax, this would work: {a, {b, c}, d} = {1, {2, 3}, 4}.

As Tuple's size and types are known at compile-time, {a, *rest} = foo kind of decomposition would be trivial, and still be completely type-safe (!!).

As @asterite said, it's pretty ugly to use this syntax for mass variable initializing, but I think it can get really handy for specialized framework and DSLs!


I think this tuple-only decomposition syntax using curly brackets {a, b} = foo is better than simple a, b = foo, as it shows explicitly that it is to be used with tuples, and what means Tuple means type-safe, and I think that is very important for Crystal.

It might be argued that a, b, c = "foo bar baz".split is too important (simple decomposition from an Array), but this example could be 'safely' written as {a, b} = "foo bar".split(limit: 2) (and would also spare a useless Array allocation!).

Edit: a split(limit: 2) can't work right now, as we can't do instance-macros: if split could be one, the macro could read the number and generate a different method based on the passed number, that can return a Tuple (→ safe 'array') with the result. I think that would be a very nice use case for instance-macros though!

HertzDevil commented 3 years ago

One layer of Tuple decomposition is possible with methods:

def destruct(car, cdar, *cddr)
  {car, cdar, cddr}
end

tup = {3, 4, 5, 'a', 7}
a, b, c = destruct(*tup) # note the splat here
a         # => 3
b         # => 4
c         # => {5, 'a', 7}
typeof(c) # => Tuple(Int32, Char, Int32)

However, this only works for splats at the trailing position, because all parameters following the splat parameter must be named parameters. With a bit of hackery the same can be done using yield splats:

def yield_each(values)
  yield *values
end

tup = {3, 4, 5, 'a', 7}
a, b, c = yield_each(tup) { |x, *y, z| {x, y, z} }
a         # => 3
b         # => {4, 5, 'a'}
c         # => 7
typeof(b) # => Tuple(Int32, Int32, Char)

At this point it makes more sense to have splats on the LHS of multi-assign expressions, especially if that same syntactic form is supported on Ruby. But this should only be done if the splat variable is assigned the same tuple and not an Array of elements.

Extra layers of decomposition can probably be done in the parser instead. This is how nested (a, b) block parameters already work; in fact I'm not quite sure why such a separation exists between those parser transformations and Crystal::LiteralExpander.

HertzDevil commented 3 years ago

Also a naive implementation of the range indexers will not work when a splat variable is present. Consider

a, b, *c, d, e = [42, 'a', ""]

# the above is equivalent to:
__temp = [42, 'a', ""]
a = __temp[0]     # => 42
b = __temp[1]     # => 'a'
c = __temp[2..-3] # => []
d = __temp[-2]    # => 'a'
e = __temp[-1]    # => ""

For comparison, the same code in Ruby will try to assign all the non-splat variables before filling the splat:

(0..7).each do |i|
  a, b, *c, d, e = (1..i).to_a
  p [a, b, c, d, e]
end

# [nil, nil, [], nil, nil]
# [1, nil, [], nil, nil]
# [1, 2, [], nil, nil]
# [1, 2, [], 3, nil]
# [1, 2, [], 3, 4]
# [1, 2, [3], 4, 5]
# [1, 2, [3, 4], 5, 6]
# [1, 2, [3, 4, 5], 6, 7]

I'm not sure about the exact expansion here, but honestly I find the behaviour for < 4 elements very misleading. So I'm against making these extra variables nilable in Crystal.

Note that the case for Tuple will be rejected by the compiler if splat into block parameters, but this happens at a later stage during semantic analysis:

def yield_each(values)
  yield *values
end

tup = {42, 'a', ""}
yield_each(tup) { |a, b, *c, d, e| } # Error: too many block arguments (given 4+, expected maximum 3)

For non-Tuple assignments it seems reasonable to inject raise IndexError.new(...) unless __temp.size >= 4 to the syntactic transform. But once again, this cannot be done by the literal expander alone if we want to omit this raise for Tuple expressions.

asterite commented 3 years ago

I think checking on size is a great idea. Doing it for tuples would work too: the if will always be false so the raise will never triggers.

HertzDevil commented 3 years ago

I replicated Ruby's multi-assign behaviour in Crystal for runtime arrays: https://play.crystal-lang.org/#/r/af2b

If we translate this to single assignments it would look like:

x1, x2, ..., xm, *glob, y1, y2, ..., yn = exp

# the above is equivalent to:
__temp = exp

x1 = __temp[0]
x2 = __temp[1]
# ...
xm = __temp[m - 1]

__temp2 = __temp[m..]
__size = __temp2.size

if __size >= n
  glob = __temp2[...-n]
  y1 = __temp2[-n]
  y2 = __temp2[-(n - 1)]
  # ...
  yn = __temp2[-1]
else
  glob = __temp2[...-__size] # always empty
  y1 = __temp2[-__size]
  y2 = __temp2[-(__size - 1)]
  # ...
  y__size = __temp2[-1]
  # the remaining `y` variables become nilable
end

Another example, with manually expanded assignments for each size: https://play.crystal-lang.org/#/r/af2o

In Crystal that O(n^2) expansion seems to be the only way to replicate this behaviour with assignments; more to the reason that we should just raise if __temp.size < m + n.

The dynamic post-splat assignment behaviour makes sense in Ruby, if you consider that multi-assignment occurs inside Ruby's stack-based VM so non-nil elements are pushed one by one. Their implementation is around here and here.

By the way, in Ruby pattern matching against what they call a "find pattern" also fails if the array is too small:

case [1, 2, 3]
in [a, b, *c, d, e]
  # match failure
in [a, *, b] # Crystal would use `*_` for this instead
  a # => 1
  b # => 3
end
asterite commented 3 years ago

Nice!

I don't think we should ever produce nil in these cases, but instead raise an exception. nil will most likely lead to compile errors that are very hard to track.