chanzuckerberg / sorbet-coerce

A type coercion lib works with Sorbet's static type checker and type definitions
https://rubygems.org/gems/sorbet-coerce
MIT License
29 stars 13 forks source link

Union Type Confusion #71

Open billy1kaplan opened 2 years ago

billy1kaplan commented 2 years ago

Describe the bug: I would expect the following to be a "coercable" union type:

class Demo
  extend T::Sig

  class Test < T::Struct
    const :key, String
    const :value, String
  end

  sig { void }
  def self.help
    TypeCoerce[T.any(String, Test)].new.from(
      {
        "key" => "Hello",
        "value" => "World",
      },
    )
  end
end

Actual result:

TypeError: T.let: Expected type T.any(Demo::Test, String), got type Hash with value {"key"=>"Hello", "value"=>"World" }

This does seem to work with the other half of the union type (i.e. for a String) where we're able to return a String directly:

sig { returns(String) }
def self.help
  TypeCoerce[T.any(String, Test)].new.from(
    "Hi",
  )
end

Expected behavior:

Naively, I would expect the above to return the Test struct with the properly initialized values. Is this a known limitation? And would this be a possible feature to support?

Versions:

billy1kaplan commented 2 years ago

This might be a bit trickier than I thought, e.g. which of these is coerced to? Maybe the user would need to supply the type of the struct in order for something like this to work:

class Test < T::Struct
  const :key, String
  const :value, String
end

class Test1 < T::Struct
  const :key, String
  const :value, String
  const :other, String
end
mattxwang commented 2 years ago

Hi @billy1kaplan, thanks for submitting an issue! I'll write out this response with the caveat that I'm only interning at @chanzuckerberg and am not the original creator of this repo.

First, to get it out of the way, I can replicate this locally:

class TestClass < T::Struct
  const :a, String
end
TypeCoerce[T.any(String, TestClass)].new.from({
  a: 'hello'
})
# TypeError: T.let: Expected type T.any(String, TestClass), got type Hash with value {:a=>"hello"}

However, this is actually a bit complicated! If you're alright with it, join me on a deeper dive!

Naively, I would expect the above to return the Test struct with the properly initialized values. Is this a known limitation?

Ah, this is - it just hasn't been documented particularly well. If you peek at the code for converting a Union:

https://github.com/chanzuckerberg/sorbet-coerce/blob/f6f66597bac2e9ded809047246369cf917085499/lib/sorbet-coerce/converter.rb#L61-L83

The only "explicitly" supported unions are of types T.nilable and T::Boolean; prior to version 0.6.0 (i.e. before #53 and #54 was merged in), this was clearer from a specific error message. It 'forwards' other unions / leaves them "as-is". I don't think we've done a great job of documenting this, though I'm happy to take that up.

In particular, the way the code is written:

So, in your case, it picks the first type and converts to String (failing). That's also why when you switch the convert value to something Stringable, it works! Unfortunately, I don't think switching the order of the classes works - it seems like they get sorted somewhere (I imagine by sorbet).

Note that this still fails if the union is part of the struct:

class TestClass < T::Struct
  const :a, String
end
class WithUnsupportedComplexUnion < T::Struct
  const :union, T.any(String, TestClass)
end
TypeCoerce[WithUnsupportedComplexUnion].new.from({
  union: {
    a: 'hello'
  }
}) 
# TypeError: Parameter 'union': Can't set WithUnsupportedComplexUnion.union to {:a=>"hello"} (instance of Hash) - need a T.any(String, TestClass)

What I can definitely do is:

And would this be a possible feature to support?

That's a great question. Right now, I don't know, though I'm happy to dig in a bit more if I have some free time. Of course, a PR is welcome if you find a way to make this work (even only for situations like your first example; the second can be "undefined behaviour" to some extent). The first step would be to relax the assumption (in the above code) that unions are only nilable or booleans, and add some more control flow logic. The else case currently assumes it's a nilable type, and clearly that's wrong in this case.

I'm also not sure if T.any is a first-class type and is intended for this purpose!

I'm certainly no expert in this / safe_type (an underlying gem we use) / sorbet-runtime, and I could be completely wrong. @donaldong and @datbth implemented this original feature - if they have any thoughts, feel free to chime in!

donaldong commented 2 years ago

@mattxwang thanks!

The only "explicitly" supported unions are of types T.nilable and T::Boolean

+1. However this does not match what the README page says: T.any(<supported type>, ...).

And would this be a possible feature to support?

I haven't looked at the code for a while, but it feels doable to me.

With

class Test < T::Struct
  const :key, String
  const :value, String
end

class Test1 < T::Struct
  const :key, String
  const :value, String
  const :other, T.nilable(String)
end

should we expect the following?

  TypeCoerce[T.any(String, Test)].new.from(
    "Hi",
  ) # => "Hi"

Can be casted to String already, return String.

  TypeCoerce[T.any(Test1, Test)].new.from(
          {
        "key" => "Hello",
        "value" => "World",
      },
  ) # => Test1

Cast to Test1 first, because that's the first valid try

  TypeCoerce[T.any(Test, Test1)].new.from(
          {
        "key" => "Hello",
        "value" => "World",
      },
  ) # => Test

Cast to Test first, because that's the first valid try

Similarly, TypeCoerce[T.any(String, Test)] will convert the hash to String, and TypeCoerce[T.any(Test, String)] will try to convert to T::Struct first if valid

mattxwang commented 2 years ago

Thanks @donaldong for the clarification! I think what you proposed makes sense. It does seem like the types in T.any are sorted, so the first try will be consistent - which may be a good thing.

However this does not match what the README page says: T.any(, ...).

Should I update the docs? I think that was merged in by #54.


@billy1kaplan any thoughts? If you'd like to contribute, I think this is a great opportunity to do so (and I can review your code, etc. if it's in the next 5 weeks). If not, no worries - I can take a stab at it.

billy1kaplan commented 2 years ago

@donaldong @mattxwang thank you both for the detailed replies!

I'm happy to take a shot at this hopefully within the next week or so. My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

mattxwang commented 2 years ago

I'm happy to take a shot at this hopefully within the next week or so.

Great! Let me know how I can support you - happy to lend a hand however I can.

My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

That sounds right to me!

leifg commented 2 years ago

My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

We implemented a monkey patch for this based on version 0.5.0. It caught the "old" ArgumentError. Not a great approach but it worked.

Now that it is a TypeError nested Union types don't seem to work so great anymore.

I also submitted #72 that illustrates the error.

datbth commented 2 years ago

I think automatically trying to coerce union types is indeed convenient.
However, it comes with these costs:

  1. Performance. Try coercing and rescuing through every type within the union type is very costly.
  2. This is a significant behavioral change and might break the existing usages due to new unexpected coercion.

The performance cost is the main concern to me.
If we are going to support this, I think we should at least make it togglable.

mattxwang commented 2 years ago

Thanks @datbth. @billy1kaplan has done some work in #73, I think it's reasonable to add a toggle (since this is a breaking change). Do you have any additional context on the original PRs that you wrote with Donald?

(for context: wrapping up an internship at CZI and doing some maintenance for some infra repos on the side)

datbth commented 2 years ago

Context of #53

Okay let me try to recall the context of https://github.com/chanzuckerberg/sorbet-coerce/pull/53.
I believe I was doing a nested coercion like this:

class Owner < T::Struct
  const :name, String
  const :pet_id, T.any(Integer, String)
end

owner_data = {
  name: 'Alice',
  pet_id: 'abcdef',
}
TypeCoerce[Owner].new.from(owner_data)

Then I got the error 'the only supported union types are T.nilable and T::Boolean' in both of these cases:

  1. When pet_id is already an Integer or String
  2. When pet_id is of an invalid type, not an Integer nor String

I don't remember exactly how nested Owner was or how complex pet_id was, but I believe they were pretty simple (like in the above example).

So by implementing https://github.com/chanzuckerberg/sorbet-coerce/pull/53, I was trying to achieve:

  1. Passing case (pet_id is of a valid type already): the coercion should keep it as-is
  2. Non-passing case (pet_id is of an invalid type): the coercion should raise an informative error about the expected type

Some additional comments on coercing union types

I myself would try to stay away from actually coercing union types because:

  1. Performance
  2. The behavior is tricky (and too magical?) and might cause unexpected results

One example for the tricky behavior issue:

class Cat < T::Struct
  const :name
end

class Dog < T::Struct
  const :name
end

class Owner < T::Struct
  const :name, String
  const :pet, T.any(Cat, Dog)
end

data = {
  name: 'Alice',
  pet: {
    type: 'dog',
    name: 'Goofy',
  }
}

TypeCoerce[Owner].new.from(data)

In this case, even though the pet data is coercible into Cat, I would prefer raising an error. Because the error helps me detect the poor handling of the pet type info.

For the case where the user explicitly tries to coerce a union type (e.g. the union type is the root type) like TypeCoerce[T.any(Cat, Dog)].new.from(pet_data) then it might make sense to carry out the automatic coercion.
But I myself have not ran into the need for this. I would always do a check on a type info, wherever it is.

mattxwang commented 2 years ago

That is great context, thank you @datbth. I don't have too much context into the vision of this gem (we still use it at CZI, but much of that is out of scope of my direct work). I can do two things: