ErwinM / acts_as_tenant

Easy multi-tenancy for Rails in a shared database setup.
MIT License
1.56k stars 264 forks source link

avoiding Postgres bitmap heap scans #303

Closed ianfloats closed 1 year ago

ianfloats commented 1 year ago

First, thanks so much for the gem, it helped us switch off Apartment and has been working great for years.

The only issue we've seen with the gem is that it causes Postgres to make bad query planner decisions.

Given two tables scoped to tenant: rooms (integer: tenant_id) reservations (integer: tenant_id, integer: room_id) (each column indexed)

A query like Room.first.reservations will of course search the reservations table where(:room_id => X, :tenant_id => Y) and this is causing Postgres to search each index, BitmapAnd the two, and then a bitmap heap scan on reservations table. This is significantly slower than just using the index on room_id.

The information that Postgres lacks is that all records with a given room_id have the same tenant_id. So actually removing tenant_id from the WHERE clause will make for a faster search and identical results.

Possible solutions:

  1. sprinkle without_tenant blocks around the code (seems dangerous and messy)
  2. use multi-column indexes [:tenant_id, :room_id] instead, which works and is what we're doing, but seems potentially wasteful of index space
  3. ActsAsTenant could see this situation and (when an option is enabled) automatically remove tenant_id scoping when already scoping by the id of a model that is itself tenant scoped
  4. something else?

Mainly I wanted to know if anyone has a better solution to this, and/or if it deserves mention in the documentation, as it would have saved me countless hours to just have made indexes with a tenant_id prefix to begin with.

Thanks for the consideration!

excid3 commented 1 year ago

I think the general advice is to use multi-column indexes. Always querying with the tenant ID on any table is the safest approach and what this gem is intended for.

The 3rd option is interesting, but in general you should always have the tenant ID on every model so that if they are ever queried directly, you can enforce multitenancy on it. This seems like it could lead dangerous situations.

ianfloats commented 1 year ago

For the third option, I was thinking the table would still have the tenant_id column, but ActsAsTenant could be told "for the reservations table, when a room_id is present in the WHERE clause, avoid adding tenant_id to the WHERE clause".

acts_as_tenant(:tenant, :skip_tenant_filter_when_filtering_by => [:room_id])

A step further, it could actually be inferred by interrogating the models, ie seeing the relation and seeing that Room is itself tenant scoped. The reason I'd think we'd not want to do it automatically is it depends how the indexes are set up. I'm not sure if ActsAsModel can also interrogate the indexes (it does seem that Rails has code for it, since it can generate db/schema.rb).

All that does sound pretty complicated, though, so short of that, I'm wondering if it's worth a mention in the docs:

Indexing when using ActsAsTenant The general advice is to use multi-column indexes on [:tenant_id, :other_id]. (Maybe something about what can happen if you index just the two columns individually.)

In our case, we started with just an index on the room_id, and this worked fine, but then we wanted to delete all data for a tenant, and so added an index on tenant_id to many tables, and that's when Postgres started using the poor query strategy for some big tables. The heap scan can take upwards of 50 seconds, where a straight room_id query might be 5 milliseconds.