Shopify / tapioca

The swiss army knife of RBI generation
MIT License
732 stars 122 forks source link

ActiveRecord relations' `each` method block argument is `T.untyped` #2034

Open marknuzz opened 2 weeks ago

marknuzz commented 2 weeks ago

Calling #each on a PrivateAssociationRelation, PrivateCollectionProxy, etc. will give an untyped block argument. However, #each_with_index will give a typed argument. This is difficult to work around (adding a line of code with a T.cast for each time #each is used, is not a workaround at all).

These are classes which are also valid Enumerables, but the each method on these classes will use the untyped ActiveRecord::Delegation methods from the rbi/gems path, and not a path defined in rbi/dsl, or even sorbet's T::Enumerable class. This causes the block argument to be untyped, which causes a lot of missed type checking!!

marknuzz commented 2 weeks ago

Adding these lines to my require.rb in the meantime:

# https://github.com/Shopify/tapioca/issues/2034
ActiveRecord::Delegation.undef_method(:each)
amomchilov commented 1 week ago

Hi Mark,

I think this list of delegated methods in ActiveRecord::Delegation is the culprit.

That module is included after Enumerable in ActiveRecord::Relation. So that (untyped) delegated method takes precedence over the inherited one from Enumerable and prevents the inheritance of the correct sig from Enumerable.

I think the fix here might be as easy as adding the correct sig to ActiveRecord::Relation in RBI central:

https://github.com/Shopify/rbi-central/blob/af4f13dd7023734deb4500018b294367a3ed1cae/rbi/annotations/activerecord.rbi#L198-L201

class ActiveRecord::Relation
  sig { returns(T::Boolean) }
  def blank?; end

+  # adapted from https://github.com/sorbet/sorbet/blob/64b1468/rbi/core/enumerable.rbi#L19-L27
+  sig do
+    abstract
+      .params(blk: T.proc.params(arg0: Elem).returns(BasicObject))
+      .returns(T.untyped)
+  end
+  sig { returns(T.self_type) }
+  def each(&blk); end
end

That would be a great first contribution. Would you like to give it a shot?

marknuzz commented 1 week ago

Sure, I'll give it a shot this week. I've been slammed with deadlines so I can't guarantee it but it sounds simple enough and may make it easier for me to contribute other fixes later.

My workaround ended up breaking the usage of the .each method with no arguments anyway, since there was no concrete implementation of .each in the RBI, and Sorbet's Enumerable each sig returns T.self_type if used with no block.

marknuzz commented 1 week ago

I think the second overload above instead of sig { returns(T.self_type) } would be sig { returns(T::Enumerator[Elem]) }

marknuzz commented 5 days ago

@amomchilov I have a PR open here to address this https://github.com/Shopify/rbi-central/pull/290

amomchilov commented 5 days ago

@marknuzz Approved and merged!

Could you please verify this locally, and confirm that it works before we close this?

marknuzz commented 5 days ago

@amomchilov Yes I have confirmed that the issue is fixed, working correctly for both the chained enumerator and non chained cases.

marknuzz commented 5 days ago

P.S. I would like to contribute to more type annotations but there's some questions I have. Is there a mailing list or group for discussion or should I ask about it in relevant issues/PRs like this https://github.com/Shopify/rbi-central/pull/166

danielveloso09 commented 3 days ago

Hi there.

This change raised us some errors, and just want to understand if it is or not expected. Til today tapioca static analysis was not checking "virtual" fields when using ActiveRecord.select, and than use it like this field is part of the original class, in this case Delayed::Job. See the example bellow to easily understand the use case.

Example with dummy code:

queue_counts = Delayed::Job.select('count(*) as job_count, field').group('field')
queue_counts.each do |queue_count|
  ...
  queue_count.job_count
  ...
end

Error given when running bundle exec srb tc:

Method job_count does not exist on Delayed::Backend::ActiveRecord::Job

To address this issue across multiple projects, we developed a generalized shim like this:

# typed: strict
class ActiveRecord::Relation
  sig { abstract.params(blk: T.proc.params(arg0: T.untyped).returns(BasicObject)).returns(T.untyped) }
  sig { abstract.returns(T::Enumerator[T.untyped]) }
  def each(&blk); end
end

Was this an expected output of these changes?

marknuzz commented 2 days ago

@danielveloso09 I don't think that this change should be blamed for the issue. If you use .to_a.each, wouldn't you get the same error even before the change?

Tapioca considers Delayed::Job::PrivateRelationGroupChain to be a T::Enumerable[Delayed::Job], and that was the case for a long time.

If your model is defined by your application, you can use the attribute class method to define a virtual attribute, like described in this issue

If the model is not defined by your application, you can add something like this to an initializer before running tapioca dsl (just make sure there's no name collisions, and I'm not 100% sure if this is the optimal solution, it's just an idea):

Delayed::Job.class_eval do
  attribute :job_count
  attribute :field
end

Let me know if that works or not

danielveloso09 commented 2 days ago

Maybe I was not clear with the examples I gave and even by mentioning virtual attributes 😅

Lets assume that in some specific Use Case your business logic requires some "complex" query to be done on top of an ActiveRecord model, and than you would need to iterate each record.

Example models:

class Table1 < ActiveRecord; end

Example of query you want to generate:

SELECT 
  count(*) as counter, 
  field, 
  STRING_AGG(name, ',') as names, 
  (CASE WHEN active == true THEN 'ACTIVE' ELSE 'INACTIVE' END) as status
FROM table1
GROUP BY field, status

Note: I know this query might not make much sense, but is just for the purpose of having an example

Example of ActiveRecord code to build the query:

results = Table1.select(
  'count(*) as counter', 
   'field', 
  "STRING_AGG(table2.name, ',') as names",
  "(CASE WHEN active == true THEN 'ACTIVE' ELSE 'INACTIVE' END) as status"
).group('field', 'status')

If we need to iterate the active record collection and use any of the fields built in the query:

...
results.each do |result|
  ...
  mapped_results[result.fielld] = {
    counter: result.counter,
    names: result.names,
    status: result.status
  }
  ...
end

With the current annotation this iteration would "complain" about counter/names/status not being methods of Table1 model. And normally I would not use virtual attribute for this specific use case.

Hope to have made it clear with these new examples. Assuming the above, is this annotation change expected to raise the error? Note that I am not saying that this annotation is wrong, just want to confirm if this is something expected or not! So we can plan or not changes on our end.

marknuzz commented 2 days ago

@danielveloso09 Thank you for the clarification. I will do my best to give my opinion:

  1. Anything that is updated from T.untyped to a concrete type, even when that type is correct, has the potential to cause a type error. If you have code that relies on the incorrect or missing typing for it to work without an error, then you can have an error if the type is fixed upstream. Updating annotations automatically and relying on those for a build to pass is not something I personally recommend doing because there could be edge cases like this. I'm not aware of any way that it is possible for Sorbet to automatically recognize when you use a one-off dynamically generated attribute. See here and here for documentation examples. This could be either a case of method_missing or a method defined on the singleton class, I don't recall off top of my head which approach Rails uses for this. But either way, there's no way for Sorbet to know about it unless the method is defined in the rbi, which you can't do for a one-off query with current tooling (unless you separately define the attribute explicitly as described above, but that would make the attribute defined in the global scope, which doesn't seem ideal if you only use it once).

  2. In your situation, if it is just a few places that this occurs, instead of shimming the each method to return T.untyped, you could reassign the block variable with a call to T.unsafe. You'll want the enumerated type to be the default, and if there's a dynamic query that has one-off dynamically-bound attributes, T.untyped seems appropriate. However, if there are many of these cases, you could consider shimming #each like you have done, it depends on what you decide is most appropriate for the situation.