Shopify / tapioca

The swiss army knife of RBI generation
MIT License
720 stars 120 forks source link

Suggestion: Implement `<Model>::Persisted` opaque type to solve column nilability confusion #1667

Open iMacTia opened 11 months ago

iMacTia commented 11 months ago

The Problem

The tapioca compiler for ActiveRecord models makes most columns nilable, ignoring database constraints. This is correct on paper, because when you call Model.new none of those would matter, until you eventually try to persist the record in the database.

In practice though, this is a common pitfall and cause for confusion, especially for those just getting started with Sorbet.

This is not a new problem, it has in fact been discussed in the past both here and in the Sorbet Slack workspace:

Existing Solutions

There are currently two possible paths to address this:

  1. The most common one is to just deal with this in the code and use T.must whenever Sorbet complains about nilability and we know the record has been persisted, so the column cannot be nil.
  2. The other, less known and currently undocumented solution is to define a StrongTypeGeneration module in the model to tell the compiler to generate the class following the database constraints (as explained in this comment). This will make the class work in most cases, but it won't allow you to call a non-nullable column on a newly instantiated model without throwing an error (e.g. Model.new.non_nilable_column)

The two solution seem mutually exclusive and present pros/cons to different cases, so developers will always need to weight in with additional T.must or T.unsafe to "correct" Sorbet misunderstandings. Ideally, we'd like to teach sorbet exactly what to expect.

A proposal for an alternative solution

I recently stumbled across the concept of "opaque types" thanks to an article from Jez.

After reading the article, I began wondering if it wouldn't be possible to change the compiler so that on top of the Model class, a Model::Persisted or PersistedModel class would be generated as well. The two classes would be identical and they'd only differ by the fact the the latter will have stricter constraints on columns, reflecting database constraints.

Then, methods signatures can be changed to return one type or the other. For example:

# app/models/user.rb
class User < ActiveRecord::Base
  # == Schema Information
  #
  # Table name: users
  #
  #  id                          :integer          not null, primary key
  #  email                       :string           default(""), not null
  #  encrypted_password          :string           default(""), not null
  #  created_at                  :datetime         not null
  #  updated_at                  :datetime         not null
end

# user.rbi
module CommonMethods
  sig { returns(PersistedUser) }
  def save; end

  sig { returns(PersistedUser) }
  def find_by; end

  sig { returns(PersistedUser) }
  def update; end
end

class User
  include CommonMethods

  sig { returns(T.nilable(Integer)) }
  def id; end

  sig { returns(T.nilable(String)) }
  def email; end

  ...
end

class PersistedUser # or User::Persisted ?
  include CommonMethods

  sig { returns(Integer) }
  def id; end

  sig { returns(String) }
  def email; end

  ...
end

cc @rafaelfranca @Morriar @KaanOzkan based on your past contributions to the topic 🙏

iMacTia commented 11 months ago

Gentle nudge on this one, I'm not pretending to see a PR to get this fixed, I merely want to kick off the discussion to understand if this makes sense or not and if you'd like to introduce it in tapioca, then I'm happy to take a stab at it myself if necessary 😄

KaanOzkan commented 10 months ago

The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.

Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.

# sorbet/rbi/dsl/persistent_user.rbi
class PersistedUser
  sig { returns(Integer) }
  def id; end

  sig { returns(String) }
  def email; end
end
# models/user.rb
class User < ApplicationRecord
  sig { returns(PersistedUser) }
  def save
    super
  end
end
iMacTia commented 10 months ago

The column can be nilable so we can't always use PersistedUser. In your example CommonMethods will always return PersistedUser. Also the common methods are defined under ActiveRecord RBIs and shared for all models.

I might have chosen the wrong names and created some confusion, but if I check the tapioca-generated RBIs for my project's models, they each have these methods specified (except for save, maybe because it actually returns a boolean?). So apologies if my example is not precise, but the idea is for all those methods (find, first, last, etc...) to return a PersistedUser rather than User. Obviously, new would return User since we'd expect those columns to be nilable, but that would be one of the very few cases where we'd deal with a standard User, I believe?

Would something like this work? Having User and PersistedUser classes each with their own set of methods and developer could manually specify if an instance is persisted and use that type rest of the way.

I don't think this would work, because then the two classes would be completely different and not interchangeable. Let's not forget that PersistedUser would not exist at runtime, so whenever we use it in a signature it should be able to represent also User.

iMacTia commented 10 months ago

Playing some more with this and I managed to get something to work in Sorbet's playground: see example

I had to abandon the idea of a Persisted::User in favour of Unpersisted::User: this is because when you deal with the union type, T.nilable is "contagious" so it makes sense to flip the proposal.

This effectively means we'd always deal with persisted records, except for when you call new or build, where you'd get an Unpersisted::<model> record (yes, I hate the name, I'm sure we'll find something better).

stathis-alexander commented 6 months ago

This is a really good idea and would be a huge improvement to Tapioca.

How would you implement save? This method returns a boolean but would need to narrow the type to User. As far as I understand, type narrowing methods don't exist in sorbet. I opened this issue over a year ago that has yet to be addressed.

In any case, I think this would be a huge improvement to Tapioca's models, allowing 100% type safety without sacrificing the convenience of not having to assert everywhere.

alexgraffeocohen commented 4 months ago

This is a really great idea! Would love to help however I can to try to make this more real, since it would definitely be a UX improvement. And I actually like the inversion to have Unpersisted::User because that's probably the less common flow, so most of the time, you wouldn't need to cast or worry about these alternate types.

Is there a path to land this sooner as we work it out if, instead of requiring the Unpersisted types to be returned by ActiveRecord APIs, we start by just including the new type definitions for folks and we can manually cast as needed for our use cases? That would still be an improvement over T.must all over the place. But have never contributed directly to Tapioca so open to whatever makes sense!

iMacTia commented 4 months ago

@stathis-alexander @alexgraffeocohen sorry I didn't mean to drop the ball on this, and I'm still very much interested in making this work. I'm just struggling to find the time at the moment 😅.

If any of you would like to push this forward or make any progress, here is what a possible roadmap could look like:

  1. Get a PoC working on the Sorbet playground <-- we're here, the one I shared in my last comment seems a good starting point
  2. Test in a real Rails application by manually creating the RBIs based on the PoC. This can be a sample app, doesn't need to be enterprise/production.
  3. Once we're confident with the RBIs, the next step is to write a tapioca compiler that will generate them automatically. NOTE: This might also require changes to the existing AR compilers to avoid conflicts on signatures.

How would you implement save?

I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔

Is there a path to land this sooner as we work it out

I don't think there is. Sorbet requires these signatures to be statically evaluated, meaning if we want the same object to behave differently, we'll effectively need 2 separate objects.

amomchilov commented 4 months ago

As a temporary stop-gap solution, we were considering adding a new method like id!:

class ApplicationRecord
  sig { returns(Integer) } # _not_ nilable
  def id! = id or raise SomeError
end

Creating a module that overrides the existing id would be even better!

stathis-alexander commented 4 months ago

@iMacTia

I'm not sure I understand the need for narrowing the type. You don't need to pass a record as a parameter, so the signature would simply be sig { returns(T::Boolean) } 🤔

If I instantiate a new object:

my_object = MyObject.new(**parameters)
reveal_type(my_object) # should show unpersisted object type

it should by default have the Unpersisted type. Later, when I save that object, it needs to "narrow" into the persisted type.

my_object.save!
reveal_type(my_object) # should show persisted type

I don't believe sorbet has any support for this kind of thing at the moment.

iMacTia commented 4 months ago

@stathis-alexander Ah no I don't think that's possible at all! That would mean changing the type of the variable simply by calling a method on it. However, sorbet does allow re-assigning a variable (with some limitations, see error 7001), so you should be able to do the following:

my_object = MyObject.new(**parameters)
reveal_type(my_object) # Unpersisted::MyObject

my_object = my_object.save!
reveal_type(my_object) # MyObject
rafaelfranca commented 4 months ago
# create_table :comments, force: true do |t|
#   t.integer :post_id, null: false
# end
comment = Comment.find(1)
comment.post_id = nil
comment.valid?

This code is totally valid code and should still work and with what you are proposing it would not.

Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.

class Comment < ApplicationRecord
  before_validation :transform_post_id

  def transform_post_id
    self.post_id = self.post_id + 1
  end
end

comment = Comment.find(1)
comment.post_id = nil
comment.valid?

I don't think the type system should lie about the runtime behavior of the system.

iMacTia commented 4 months ago

This code is totally valid code and should still work and with what you are proposing it would not.

Fair, but how common would that use-case be? And what if it could be solved with a T.cast(comment, Unpersisted::MyObject)?

Worse yet, something like this would explode at runtime but pass type checking with what is being proposed here.

I'm not sure I follow this one, the last three lines are the same as the example above which we agreed would not pass type-checking, so I guess there's a typo in the code unless I'm missing something obvious 🤔

Anyway, I completely understand the "the type system shouldn't lie" position and honestly it's hard to argue against it. My point I guess is: is it right to cause friction on development for the sake of correctness? From my standpoint, I'd favour a solution that might sacrifice some correctness if it helps improving the experience of the majority of cases.

For example (which might relate to your second snippet above), I'd prefer dealing with an immutable object retrieved from the DB but that it gives me some guarantees, rather than having to deal with constant nullability. To achieve that, we could make all setters forbidden (e.g. post_id= or assign_attributes) on persisted object, and if you do need to make changes then type casting or even re-instantiation might be needed.

Now that I said that out loud, I wonder if this might pair well with readonly! 😲

stathis-alexander commented 4 months ago

Anyway, I completely understand the "the type system shouldn't lie" position

I think tapioca as a community has already taken the position that accuracy is more important than ergonomics, for better or worse.

I think the proper way to handle this dichotomy is to allow this as a configuration option. I think many would agree that the convenience here is worth the slightly less accurate typing, as evidenced by the number of questions, issues, etc that come up concerning the default nil-able types generated for active record models with non-nil validations (or non-null column constraints). Those of us who feel that way should be able to toggle tapioca into a more relaxed type system.

Ultimately, sorbet was built to enable higher developer productivity and improved developer experience.

iMacTia commented 4 months ago

Making this configurable would definitely make everyone happy, but as an OSS maintainer myself, I know that's the easiest way to get things harder to maintain. So happy to leave this open a bit more to facilitate discussion, this is not a bug anyway (and to that end, happy to turn this into a Discussion if you prefer)

Morriar commented 4 months ago

Rather than making another configuration option for this in Tapioca, can't you disable the default DSL compiler and provide your own with the behavior that suits you?

stathis-alexander commented 4 months ago

@Morriar - Yes, that's of course an option, although... it requires a heavy lift and would need to be maintained in parallel with the default one.

Instead, this change allows monkey patching existing behavior. I've got something worked out already for the nil-able behavior for AR columns, re-using some helper method from the now deprecated sorbet-rails:

module Tapioca
  module Dsl
    module Helpers
      class ActiveRecordColumnTypeHelper
        include SorbetRailsBorrowedMethods

        # https://github.com/Shopify/tapioca/blob/main/lib/tapioca/dsl/helpers/active_record_column_type_helper.rb#L37
        def column_type_for(column_name)
          return ['T.untyped', 'T.untyped'] if do_not_generate_strong_types?(@constant)

          column = @constant.columns_hash[column_name]
          column_type = @constant.attribute_types[column_name]
          getter_type = type_for_activerecord_value(column_type)
          setter_type =
            case column_type
            when ActiveRecord::Enum::EnumType
              enum_setter_type(column_type)
            else
              getter_type
            end

          if column&.null && !attribute_has_unconditional_presence_validation?(column_name)
            getter_type = as_nilable_type(getter_type) unless not_nilable_serialized_column?(column_type)
            return [getter_type, as_nilable_type(setter_type)]
          end

          [getter_type, setter_type]
        end
      end
    end
  end
end

It's on my list of things to do to host a custom compilers repo that others can share (money-rails being an example).

aericson commented 2 months ago

Can be closed by https://github.com/Shopify/tapioca/pull/1888 perhaps?

cc: @paracycle

iMacTia commented 2 months ago

1888 is a very much welcomed change and we've already started using it in our application!

But to be fair, the idea here is slightly different: while #1888 now allows you to choose how your AR class is typed, this suggestion wants to actually have 2 different classes (statically) to properly deal with both cases.

The solution provided by #1888 asks you to choose between the two cases and apply that everywhere, even where it doesn't really holds true (e.g. if you go :persisted, you won't catch things like Model.new.non_nullable_column)