Investigate fan-out join queries for loaders (cache warming)

expo / entity

Entity is a privacy-aware data layer for defining, caching, and authorizing access to application data models.

MIT License

142 stars 4 forks source link

Entity works on full rows of data (the projection of all queries is *). This is because the privacy logic of a privacy policy is allowed to use the entire entity rather than just a subset of fields, which makes for a very expressive potential of authorization logic.

Joins are extremely powerful in the sense that they allow combining two or more tables in order to create a third intermediate/temporary "table". The issue is that in the general case, this third table isn't possible to express using an entity (since it's dynamic) and therefore creates an issue with how to authorize access to a row of that "table".

So, the tradeoff that entity makes is to not allow joins and to require doing them in the application in exchange for authorization correctness. While this does incur potentially higher memory and an extra round trip to the DB, the tradeoff is most often worth it since we can now guarantee that all data access is authorized and the authorization is sound.

Now, there is a very narrow join case that could work with entities, and that's the case of doing a fan-out load with the projection limited to the * of the resulting object. For example:

select b.*
from apps ap
join builds b on b.app_id = ap.id
where ap.account_id = 'aaa-bbb-ccc-123'

This seems like an interesting case to investigate since it is theoretically possible to authorize the returned objects since they have entity type "build". The association loader could make use of it as well. Association loader is currently just a set of convenience methods that load the chain of entities specified and currently only supports 1:1 foreign keys but it's definitely possible to extend it to 1:n for fan-outs. We could also potentially change it to construct these join queries, but we'd have to be fairly careful to ensure that it's built in a way that is general enough that both RDBMS and nosql could theoretically implement the "join" logic since entity is built to be database-independent.

Coming back to this and doing some investigation in code resulted in some interesting cases. Essentially, we have quite a lot of surface area and cases to think through if we want to implement this.

The way I'm thinking about this interface is something like await EntityLoader.loadById(id, loadDirectives), which loads and returns the entity by ID, and also pre-caches entities directed by loadDirectives in the dataloader so that subsequent calls to get those entities is faster.

When a load is issued with load directives, what do we do if we're in a transaction (which bypasses any dataloader and cache adapters and goes straight to the database adapter for data)?
- I'm thinking we should ignore the directives. Otherwise we'd have to change the return type of the loader function to return loaded directives as well.
When a load is issued with load directives and the requested root entity is already in the dataloader or cache adapter (entity with ID in example) such that no database call is issued, what do we do about the load directives?
- I'm thinking we should ignore the directives. In my opinion, we don't want to slow down the initial call in order to pre-cache future calls (which will succeed whether or not they're pre-cached).
When a load is issued with load directives and the database adapter is used (request not in dataloader or cache adapter), what should we do?
- Issue a query of the form SELECT [column fields list aliased with prefix as table_1], [column fields list aliased with prefix as table_2], ..., [column fields list aliased with prefix as table_n], FROM table_1 LEFT JOIN table_2 ON table_1.<load_directive_column> = table_2.<load_directive_load_by_column> LEFT JOIN ... ON ... LEFT JOIN table_n ON table_1.<load_directive_column> = table_n.<load_directive_load_by_column>. Then, in the database adapter, parse out the different entity fields into separate objects and return them to the data manager via an agreed-upon format. Then, in the data manager, somehow insert the results into the other entities' data managers for future use.

Overall this seems like it should work, but we do need to decide whether the tradeoff of the added complexity and implicit edge cases are worth it.

expo / entity

Investigate fan-out join queries for loaders (cache warming) #45