Add explicit ability to choose order of Tapioca compiler execution

bitwise-aiden commented 4 months ago

Description

At present when compilers for DSL generation are loaded they are alphabetized and run in that order:

https://github.com/Shopify/tapioca/blob/8678e2ab7b853ccbf3f092ccf553e58773eaa38b/lib/tapioca/dsl/pipeline.rb#L115-L123

What this effectively means is that if two compilers create the same method with different signatures, whichever compiler was run last will be the one to define the signature used. If someone were wanting to write a different signature than a Tapioca defined one, their only recourse would be to name their compiler something that would come after Tapioca::Dsl::Compilers::* alphabetically.

Having the ability to push the compiler forward or back in the run order would make it possible to have the explicit control over the order rather than relying on discovering the above code to then know to name it something that puts it in the right place. Naming is already hard enough, but name golf to get your compiler to run in order although fun seems a little counter intuitive.

Solutions

Order value

One solution would be to have a order value which is an integer representing where it is expected to run. This could then update the code above to order based on this value first, then alphabetically next to ensure a stable order run. The order value could be an integer between 0-100 and by default if not set, the could be 50 to allow for room either side.

This would then allow projects to set the value of their compilers based on what they are needed and even override ones from their gems/Tapioca to ensure the order needed for them.

Method override flags

Given that the heart of the problem is that a method needs to be defined after an existing to override it's signature, maybe when generating it, optional flags can be passed to determine how it interacts with existing / future methods of the same name. It would have to be able to handle future cases too because of the aforementioned ordering alphabetically.

bitwise-aiden commented 4 months ago

Another option may be to have scopes to when the compilers are run. The main 3 being Tapioca -> Gems -> Project. I have yet to think of an instance where it is beneficial to run a compiler before another, so having this hierarchy could solve the problem just as well. The reasoning being that:

Tapioca has the widest coverage, should manage to generate the most RBIs
Gems will have the most specific RBI generated for their given work
Project will have the edge cases that are unique to what they have implemented.

With this setup we can make sure that if a project wanted to override the signatures of methods generated in either of the first two then it would be more than capable of doing so.

KaanOzkan commented 4 months ago

For the different signature use case is it enough to run custom compilers after default ones?

bitwise-aiden commented 4 months ago

For the different signature use case is it enough to run custom compilers after default ones?

Yes, I think providing the way to run them after would likely hit 99+% of use cases. Separating them out would hopefully fix that.

egiurleo commented 4 months ago

I like your third idea -- seems like the least amount of work and covers the most use cases!

bitwise-aiden commented 4 months ago

I like your third idea -- seems like the least amount of work and covers the most use cases!

Me too, feels like the best approach regardless.

paracycle commented 4 months ago

I'd like to question the premise of the issue, though. What is a real life example where 2 compilers want to type the same method in different ways?

I'd be wary of solutionizing before we understand if the problem is really a problem.

bitwise-aiden commented 4 months ago

The main reason I have is for a compiler local to a project needing to override the result of the compiler in Tapioca because there are some additional special cases inserted by the project that aren't known / should not be handled by Tapioca.

bitwise-aiden commented 4 months ago

As a concrete example, overriding something like the #find method of ActiveRecord to take an extra type that isn't taken by default.

What approach would you take for addressing that?

paracycle commented 4 months ago

This should allow you to solve that: https://github.com/Shopify/tapioca/pull/1967

paracycle commented 4 months ago

I also added a DSL CLI test that shows how to use the extensibility point inside a custom "compiler".

bitwise-aiden commented 4 months ago

Although it fixes this one example, I do think that there is very likely a need for others to override behaviours. I think the layered tapioca > gems > project ordering is probably a good quality of life feature so fixes don't have to be upstreamed.

paracycle commented 4 months ago

Although it fixes this one example, I do think that there is very likely a need for others to override behaviours. I think the layered tapioca > gems > project ordering is probably a good quality of life feature so fixes don't have to be upstreamed.

None of those should require RBI generation time modification. All modifications should be doable at load time when compilers are being loaded.

paracycle commented 4 months ago

Btw, for the find signature, I am actually pretty unhappy with the fact that we've limited the set of id types artificially, when it is obvious that that isn't enough. So, I generally think this is a short-coming of the compiler itself.

I remember that we had a problem with overloads not working as expected when one of them was T.untyped but I think if we are careful about the overload sig ordering, we can still use T.untyped:

# typed: __STDLIB_INTERNAL

class Foo
    extend T::Sig

    sig { params(a: T::Array[T.untyped]).returns(Integer) }
    sig { params(a: T.untyped).returns(String) }
    def self.foo(a)
      T.unsafe(nil)
    end
end

T.reveal_type Foo.foo(1) # Revealed type: String
T.reveal_type Foo.foo([1, "a"]) # Revealed type: Integer
T.reveal_type Foo.foo([[1, "a"]]) # Revealed type: Integer

seems to work just fine and is a more "correct" signature for the method.

@KaanOzkan do you remember why we went the way of specifying all these id types?

paracycle commented 4 months ago

Ah, found the problem: https://github.com/Shopify/tapioca/pull/1799#discussion_r1511362645

# typed: __STDLIB_INTERNAL

class Foo
    extend T::Sig

    sig { params(a: T::Array[T.untyped]).returns(T::Enumerable[Foo]) }
    sig { params(a: T.untyped).returns(Foo) }
    def self.foo(a)
      T.unsafe(nil)
    end
end

array_of_untyped = T.let(T.unsafe(nil), T::Array[T.untyped])
untyped = T.let(T.unsafe(nil), T.untyped)

T.reveal_type Foo.foo(untyped) # Revealed type: T::Enumerable[Foo]
T.reveal_type Foo.foo(array_of_untyped) # Revealed type: T::Enumerable[Foo]

which is really strange.

Morriar commented 4 months ago

What happens if you change the order of the signatures?

bitwise-aiden commented 4 months ago

None of those should require RBI generation time modification. All modifications should be doable at load time when compilers are being loaded.

Exactly and that is the issue that I'm having, we load the compilers, order them alphabetically, then run them. What I'm proposing is that we load them, order them by context AND alphabetically, then run them.

Something like:

TAPIOCA = 0
GEMS = 1
LOCAL = 2

Runtime::Reflection.descendants_of(Compiler).sort_by do |compiler|
  context = context_for_compiler(compiler)
  [context, T.must(compiler.name]
end

What happens if you change the order of the signatures?

The main issue I was having is that two compilers generate two separate methods and can't impact the order of the sigs, so whichever one is read last will be the one that's sigs are applied. e.g.

class SomeModel
  module CommonRelationMethods
    sig { params(args: SomeIdType).returns(SomeModel)
    def find(args = nil, &block); end

    sig { params(args: T.untyped).returns(T.untyped)
    def find(args = nil, &block); end
  end
end

The sig with SomeIdType will never be used.

paracycle commented 4 months ago

Exactly and that is the issue that I'm having, we load the compilers, order them alphabetically, then run them.

What I was trying to say, but couldn't word properly, was that the compiler run order should be immaterial for properly constructed DSL compilers. That's why I am slightly opposed to implementing a certain kind of run order for compilers.

The main behaviour of DSL compilers should have been settled at load time, before we get to running. When we run the compilers, their behaviours should not depend on each other or on any runtime ordering (which implies the same thing).

paracycle commented 4 months ago

What happens if you change the order of the signatures?

I am assuming that was question about the order of T::Array[T.untyped] and T.untyped in the signatures that I sent, @bitwise-aiden.

That doesn't change much it still matches the first signature that matches:

# typed: __STDLIB_INTERNAL

class Foo
    extend T::Sig

    sig { params(a: T.untyped).returns(Foo) }
    sig { params(a: T::Array[T.untyped]).returns(T::Enumerable[Foo]) }
    def self.foo(a)
      T.unsafe(nil)
    end
end

array_of_untyped = T.let(T.unsafe(nil), T::Array[T.untyped])
untyped = T.let(T.unsafe(nil), T.untyped)

T.reveal_type Foo.foo(untyped) # Revealed type: Foo
T.reveal_type Foo.foo(array_of_untyped) # Revealed type: Foo

and I think actually understand the problem now.

The argument being T.untyped obviously will match the first signature that is listed, regardless of what types the signature has. This ordering of signatures in the previous post is the correct one which works properly for typed arguments, but then untyped cases match the one with the array argument (listed as the first sig), which returns an enumerable of Foo, which is not the correct assumption for the majority of the cases.

So, to make the default behaviour with T.untyped be the expected common behaviour of returning a Foo, we end up listing out all the expected types to remove T.untyped from the signature, which allows us to flip the order of the signatures, so that the first one can be the one without the array, and thus the common usage of the method.

bitwise-aiden commented 4 months ago

The main behaviour of DSL compilers should have been settled at load time, before we get to running. When we run the compilers, their behaviours should not depend on each other or on any runtime ordering (which implies the same thing).

Okay, yeah. I'm with you on that. I think there are a rich set of additional tooling that would help to create signatures across compilers, but with what we have I think there is a bit of a short coming.

paracycle commented 4 months ago

I think there are a rich set of additional tooling that would help to create signatures across compilers, but with what we have I think there is a bit of a short coming.

That might be true, but I think we should address that as a problem on its own and not as a problem of DSL compiler ordering.

For example, we could create a create_or_modify_method helper for DSL compilers which they could use that exclusively and has some kind of conflict resolution semantics built in. That might be another way for DSL compiler to work collaboratively.

Basically, I'd like to keep the solution space wide enough to allow those kinds of solutions without committing to any ordering guarantees.

bitwise-aiden commented 4 months ago

For example, we could create a create_or_modify_method helper for DSL compilers which they could use that exclusively and has some kind of conflict resolution semantics built in. That might be another way for DSL compiler to work collaboratively.

+1 for this. I was definitely wanting for this over the weekend.

paracycle commented 4 months ago

I am closing this issue, since the specific ask is, in my opinion, not something we want and the root cause for needing this was addressed in #1967.

If you would like to open another issue with concrete examples of DSL compilers needing cross-compiler collaboration, we could discuss the possible approaches in there.

Shopify / tapioca