crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.41k stars 1.62k forks source link

Add `Enumerable#compact` #11968

Open didactic-drunk opened 2 years ago

didactic-drunk commented 2 years ago

Feature Request

Crystal splats always use Tuple with each. It would be nice to map/reduce/etc them by including Enumerable.This would DRY up code in stdlib which uses each to make a temporary Array from the Tuple.

# With Enumerable
def foo(*args)
  args.compact.map { ... }
end
# Without Enumerable
def foo(*args)
  ary = [] of Type
  args.each do |arg|
    ary << arg if arg
  end
  ary.map { ... }
end
Blacksmoke16 commented 2 years ago

To be clear Tuple already does include Enumerable via Indexable. But compact isn't a method defined outside of Array and Hash. You could use compact_map tho, which is defined on Enumerable. E.g.

def foo(*args)
  args.compact_map { |v| v + 1 if v }
end

pp(foo(1, nil, 450)) # => [2, 451]

The key thing to point out is it returns an Array and not another Tuple ofc.

asterite commented 2 years ago

I think we should add compact to Enumerable. It's probably a mistake that it's not defined there.

straight-shoota commented 2 years ago

We need to decide whether we want a custom implementation of Tuple#compact that returns a Tuple instance. The problem is with nilable types in the tuple. The number of resulting types explodes exponentially as explained in https://github.com/crystal-lang/crystal/pull/12080#issuecomment-1142642075.

So I think it's doubtful whether that even makes sense.

The standard inherited implementation from Enumerable#compact returning an Array might be more useful. Although it might also be unexpected.

I'm not sure what's the best solution here.

asterite commented 2 years ago

I don't think there's any use case for Tuple.compact returning a Tuple.

yxhuvud commented 2 years ago

On the other hand, perhaps we could consider Tuple a (very) special case of Enumerable and handle it separately.

I'd find it natural that Set{1,nil}.compact and Deque{1,nil}.compact both return their own type with Nil stripped from the generic argument rather than an Array. That would also match the Hash behavior. Tuple (and potentially other Value types where it wouldn't make sense) could have its own definition as the default one doesn't make sense for it.

straight-shoota commented 2 years ago

Yes, it certainly makes sense for implementing types to override the implementation and return an instance of themselves when this is possible.

But for Tuple this is difficult because it results in a type explosion. For other types the type becomes smaller (Nil is removed).

I don't think it makes much sense to return a tuple value when it's not even clear how many elements it has and which type at each position. That's just outside the purpose of Tuple and seems more like a StaticArray (that's another type which could use a custom implementation btw.).

asterite commented 2 years ago

I'm starting to regret a bit having so many collection types, and types in general. I really like Ruby's simplicity: Array for lists of things (and just Array) and Hash for key-value pairs (and just Hash). Having all these types is extremely confusing and tiring. I wonder if we can do better here... 🤔

yxhuvud commented 2 years ago

I like having many collection types but there is definitely a progress towards more middle implementation layers, which definitely add complexity and make things harder to understand.

straight-shoota commented 2 years ago

@asterite If it was only for stdlib, yeah I suppose we could be done with that. But there is nevertheless a need for other dedicated data structures for solving specific problems. And we're designing stdlib APIs to be usable with custom collection implementations. That's requires more middle layers and adds complexity. It probably doesn't help that we're building this on the Ruby APIs which are not really designed for this. And I personally feel the naming of collection abstractions with *able mixins isn't great. It makes it unnecessarily hard to grasp the concept. I think it would be more clear if Indexable and Indexable::Mutable were named ReadOnlyList and List, for example (I don't think such a rename would be realistic, it's more about if we were starting from scratch).

ftarulla commented 2 years ago

Hey everyone! 👋 I was wondering if there is a path that I should take so that PR #12080 can be approved.

After reading the previous comments and talking to @straight-shoota, it seems that a possible way could be to:

  1. remove Tuple#compact (as it goes against the essence of this collection).
  2. Implement Set#compact and Deque#compact so they're polymorphic (with respect to this message) with other collections (mainly Array and Hash that already have custom implementations) avoiding the new default implementation Enumerable#compact that returns an Array.

What do you think?

asterite commented 2 years ago

No. compact should always return Array. No other collection overrides methods from Enumerable to return non-Array types (except Tuple in map, not sure if there's another case)

straight-shoota commented 2 years ago

That's not true. Hash#compact would be an example of an existing #compact implementation that returns the specific type.

Other evidences are #map implementations of Slice and StaticArray. It's true that this is not implemented all over the place. #select and #reject are missing except on Hash, for example. But I would see that as an unintended omission.

Array is the type used in the default implementation. But when I have a collection of a concrete type that makes sense as result for a map operation, I expect that collection type to be used as a result. For example, when I have a Set of strings and want those strings to be uppercase, I would not expect set.map &.upcase to turn my Set into an Array.

asterite commented 2 years ago

I would not expect set.map &.upcase to turn my Set into an Array.

Okay, but that's what happens. Even in Ruby.

asterite commented 2 years ago

Put another way: I think it's a big burden to reimplement all of Enumerable's methods to return the container type. What are we going to do, implement select, reject, map, map_with_index, etc., for every container? What about:

set = Set{1, 2, 3, 4, 5}
set.map { |x| x // 2 }

Is that Set{0, 1}? But we are mapping every element in the set, it's a bit strange that we don't get one element back for each original element. That's not what map means.

I think that's why in general, or, well, always, in Ruby you get Array back. It's the most universal data structure when you are doing these operations. And once you start doing them, you don't really care about the container's properties.

Just my opinion, though!

HertzDevil commented 2 years ago

For that matter, Set[1, 2, 3, 4, 5].map! { 1 } does turn the receiver into a one-element set in Ruby, and it is reasonable to suggest that #map would more or less behave like dup.map!, at least as far as value identity is concerned.

This is where the similarity ends. In Ruby you can do dup.map! even if map isn't defined or polymorphic, because there are no generics (they are in TypeProf and I have no idea what happens there); in Crystal the former preserves the generic type arguments and the latter doesn't.

didactic-drunk commented 2 years ago

Put another way: I think it's a big burden to reimplement all of Enumerable's methods to return the container type. What are we going to do, implement select, reject, map, map_with_index, etc., for every container?

What about returning a chainable Enumerable until to_a, to_h, to_set is called?

No intermediate Array's:

ary = [1, 2, 3]
ary.map { ... }.select { ... }.map { ... }.each { ... }
HertzDevil commented 2 years ago

Isn't that what Iterator is for?

Even then this only applies to types that have a sensible conversion method. We can do Set#compact:

set = Set{1, 2, nil, 3.as(Int32?), nil.as(Int32?)}
x = Set{*set.each.compact_map(&.itself)} # why isn't `Iterator#compact` a thing?
# or: `set.each.compact_map(&.itself).to_set`

x         # => Set{1, 2, 3}
typeof(x) # => Set(Int32)

but not Tuple#compact or Slice#compact in this manner, because this approach requires an insertable container.

HertzDevil commented 2 years ago

Yeah if Enumerable#compact didn't exist then neither would Iterator#compact. We should decide whether the latter means the same thing as compact_map(&.itself), because it is a breaking change if we add it in a version after #12080 is merged.

And it goes without saying that other Enumerable methods such as #map and #select are already polymorphic on Iterator. It is the one subtype of Enumerable that strives to be polymorphic even if the container subtypes aren't.

beta-ziliani commented 2 years ago

Sorry, I don't follow @HertzDevil. Why merging #12080 would make Iterator#compact = compact_map(&.itself) a breaking change? As I see it, both Enumerable and Iterator could (and should) define #compact = compact_map(&.itself), in the first case returning the Array and in the second self.

ftarulla commented 2 years ago

Hey everyone! 👋 After reading the comments here and talking to @beta-ziliani and @straight-shoota, I'll be following this path:

  1. Remove Tuple#compact (as it goes against the essence of this collection).
  2. Specify Enumerable as the returning type for Enumerable#compact, hiding the method's implementation and leaving room for future custom implementations redefining to a more specific returned type (although from what I could read in the docs I think covariant method return type is still not supported 🤔).

The user of this API will have to explicitly convert to an Array (using Enumerable#to_a) before trying to do something like:

e = Set{1, 2, 3, nil}.compact
puts e[0]

Although the example above would work, this should not be done as it would depend on the implementation instead of the method's definition.

We should do:

e = Set{1, 2, 3, nil}.compact
arr = e.to_a # explicitly convert to `Array`
puts arr[0]

What do you think?

straight-shoota commented 1 year ago

I added a couple of enhancements to the PR.

If T is not nilable, we can just dup the collection as a performance optimization. I applied the same optimization also to Hash#compact (which operates a bit differently), and all specialized implementations. Specialized implementations for Slice, Deque and Set always return their self type (minus Nil in the element type, of course). I don't think any other implementations of Enumerable in stdlib needs a specialized implementation because their element types are usually not nilable, thus the inherited implementation of Enumerable#compact just becomes #dup.