expo / entity

Entity is a privacy-aware data layer for defining, caching, and authorizing access to application data models.
https://expo.github.io/entity/
MIT License
142 stars 4 forks source link

Investigate fan-out join queries for loaders (cache warming) #45

Open wschurman opened 4 years ago

wschurman commented 4 years ago

Entity works on full rows of data (the projection of all queries is *). This is because the privacy logic of a privacy policy is allowed to use the entire entity rather than just a subset of fields, which makes for a very expressive potential of authorization logic.

Joins are extremely powerful in the sense that they allow combining two or more tables in order to create a third intermediate/temporary "table". The issue is that in the general case, this third table isn't possible to express using an entity (since it's dynamic) and therefore creates an issue with how to authorize access to a row of that "table".

So, the tradeoff that entity makes is to not allow joins and to require doing them in the application in exchange for authorization correctness. While this does incur potentially higher memory and an extra round trip to the DB, the tradeoff is most often worth it since we can now guarantee that all data access is authorized and the authorization is sound.

Now, there is a very narrow join case that could work with entities, and that's the case of doing a fan-out load with the projection limited to the * of the resulting object. For example:

select b.*
from apps ap
join builds b on b.app_id = ap.id
where ap.account_id = 'aaa-bbb-ccc-123'

This seems like an interesting case to investigate since it is theoretically possible to authorize the returned objects since they have entity type "build". The association loader could make use of it as well. Association loader is currently just a set of convenience methods that load the chain of entities specified and currently only supports 1:1 foreign keys but it's definitely possible to extend it to 1:n for fan-outs. We could also potentially change it to construct these join queries, but we'd have to be fairly careful to ensure that it's built in a way that is general enough that both RDBMS and nosql could theoretically implement the "join" logic since entity is built to be database-independent.

wschurman commented 4 years ago

Coming back to this and doing some investigation in code resulted in some interesting cases. Essentially, we have quite a lot of surface area and cases to think through if we want to implement this.

The way I'm thinking about this interface is something like await EntityLoader.loadById(id, loadDirectives), which loads and returns the entity by ID, and also pre-caches entities directed by loadDirectives in the dataloader so that subsequent calls to get those entities is faster.

Overall this seems like it should work, but we do need to decide whether the tradeoff of the added complexity and implicit edge cases are worth it.

ide commented 4 years ago

The main advantage I see is the reduction in transactions, but we could perhaps get that with explicit BEGIN/COMMIT or START TRANSACTION in our DataLoader.