authzed / spicedb

Open Source, Google Zanzibar-inspired database for scalably storing and querying fine-grained authorization data
https://authzed.com/docs
Apache License 2.0
5.12k stars 278 forks source link

Proposal: Lookup Watch API and Tiger Cache for fast ACL-aware filtering #207

Open josephschorr opened 3 years ago

josephschorr commented 3 years ago

Tiger Cache

Lookup Watch API and Caching Service ("Tiger cache")

Background

The Lookup Resources API provides a graph-driven permission-aware reverse lookup of all resources, of a particular type, that are accessible within a SpiceDB permissions system for a specified subject.

This API was implemented to solve the ACL-aware indexing problem, and does so in a solid manner for the common use case of listing all resources accessible to a subject.

However, the Lookup Resources API has two downsides:

1) it must compute the results from the SpiceDB graph in real time - while caching helps, this can be somewhat unperformant[^1] in systems with millions of relationships and deeply nested schemas 2) it provides no ability to perform any additional filtering beyond the type of resource for the target subject - often, applications will want to both filter based on permissions AND other criteria (entered queries, last accessed date/time, etc).

[^1]: Various tests show approximately ~100ms for a LookupResources call on a SpiceDB with 100-250K relationships, and a graph nesting of 3-5 deep.

To solve both of these uses, this issue proposes three new pieces of technology for SpiceDB:

1) a reachability resolution API to be used in concert with the existing Watch API 2) a Lookup Resources Watch API Service 3) a Lookup Resources Caching Service (nicknamed the "Tiger" cache, to follow the naming scheme of the "Leopard" cache for nesting from the Zanzibar paper, and because of its proposed use of a Roaring Bitmap)

Proposal

NOTE: In order to effectively use Roaring Bitmaps in the ways specified below, ALL resource IDs in the system must be 64-bit integers. We likely need to make it a requirement to use these new services, unless we can come up with an alternative way of storing and transmitting the data, in a compact and efficient form.

Reachability Resolution APIs

The first new piece to be added to SpiceDB will be the "Reachability Resolution APIs".

In simplest terms, the reachability resolution APIs will provide a means of asking SpiceDB for the list of dependencies that could have been affected by the addition or removal of a particular relationship in the graph.

The reachability resolution APIs will provides two methods for determining the possible set of changes:

rpc ReachableResources(ReachableResourcesRequest) returns (stream ReachableResourcesResponse) {}

rpc LookupSubjects(LookupSubjectsRequest) returns (stream LookupSubjectsResponse) {}

message ReachableResourcesRequest {
    Consistency consistency = 1;

    ObjectReference starting_resource = 2;
    string target_object_type = 3;
}

message ReachableResourcesResponse {
    ObjectReference found_resource = 1;
    bool direct_path = 2;
    ZedToken found_at = 3;
}

message LookupSubjectsRequest {
    Consistency consistency = 1;

    ObjectReference resource = 2;
    string optional_resource_relation = 3;

    string target_subject_type = 4;
    string optional_subject_relation = 5;
}

message LookupSubjectsResponse {
    Relationship found_relationship = 1;
    ZedToken found_at = 2;
}

These APIs would provide the ability to walk outward from starting objects, returning any of the encountered target resource or subject kinds encountered along the walk of the graph.

NOTE: the inclusion of direct_path in ReachableResourcesResponse will indicate that the path walked to reach the possibly reachable resource only traversed direct (non-intersection or exclusion) operations. This will be used as an optimization later in the proposal.

Also NOTE that the responses to these APIs are streaming: This is important for scalability, but does mean that duplicates will be returned.

Example

Consider the following schema and existing relationships:

definition user {}
definition team {
    member: user | team#member
}

definition document {
    relation viewer: user | team#member
    relation writer: user | team#member

    permission write = writer
    permission view = viewer + writer
}
document:somedoc#reader@user:someuser
document:somedoc#writer@team:toplevelteam#member
document:anotherdoc#reader@team:middleteam#member

team:toplevelteam#member@team:middleteam#member
team:middleteam#member@team:anotherteam#member
team:anotherteam#member@user:oneteammember
team:anotherteam#member@user:twoteammember

If the watch API reported that the following relationship was removed:

team:middleteam#member@team:anotherteam#member

Then the following API calls could be made to the reachability API:

> ReachableResourcesRequest{ team:middleteam, document }

ReachableResourcesResponse{
    found: document:somedoc
    direct_path: true
}

ReachableResourcesResponse{
    found: document:anotherdoc
    direct_path: true
}

> LookupSubjectsRequest { 'team:anotherteam#member', user }

LookupSubjectsResponse { 'team:anotherteam#member@user:oneteammember' }
LookupSubjectsResponse { 'team:anotherteam#member@user:twoteammember' }

NOTE: These APIs, if traversing deeply nested object types, would also benefit from a Leopard cache, when we implement that.

Lookup Watch API Service

The Lookup Watch API Service will make use of the Watch and the Reachability Resolution APIs to provide a stream of updates indicating changes in the accessibility of a kind of resources, for a specific kind of subject.

Lookup Watch Service API

The Watch API service will provide a single API, WatchAccessibleResources, for receiving a stream of updates for resources of the requested kind, for the requested permission, for a specific subject.

rpc WatchAccessibleResources(WatchAccessibleResourcesRequest) returns (stream WatchAccessibleResourcesResponse) {}

message WatchAccessibleResourcesRequest {
    string resource_object_type = 1;
    string permission = 2;

    string subject_object_type = 3;
    string optional_subject_relation = 4;

    ZedToken optional_start_cursor = 5;
}

message PermissionUpdate {
    SubjectReference subject = 1;
    Permissionship updated_permission = 2;
}

message WatchAccessibleResourcesResponse {
    repeated PermissionUpdate updates = 1;
    ZedToken changes_through = 2;
}
Possible Implementation (Reconciliation Loop)

For each caller:

1) The Lookup Watch Service will start a goroutine, invoking Watch on all object types registered in the system (this will likely require a change to the V1 API to enable), starting from the optionally provided optional_start_cursor ZedToken. 2) Whenever a relationship is added or removed, the reconciliation process begins: 1) ReachableResources is invoked for the resource of the changed relationship, with a target_object_type of the monitored resource_type 1) LookupSubjects is invoked for the subject of the changed relationship, with a target_subject_type of the monitored subject_type 1) For each combinatorial pair of {resource, subject}, a "check job" is fired off to the cluster 1) Each check job performs a Check of resource#permission@subject at the changed revision, and returns a result to be fed to the Watch API stream

NOTE: The above proposed design does not keep track of the existing status of a permission for a resource and subject; this means the Watch API will likely report subjects gaining and removing permission, even if the relationship change didn't necessarily change that permission. If we want difference tracking, we'd have to likely integrate with the Tiger Cache (see below).

Scaling the Lookup Watch API Service

To ensure that the Lookup Watch API Service can operate at scale, it is recommended that Check operations be sharded out amongst a cluster of watch service nodes, with each reporting its status back to the leader.

Some options:

1) One option would be to implement the above process as a Map Reduce, using a library such as https://github.com/chrislusf/gleam. Each relationship update received from the Watch API will be considered a single job, with the combinatorial pairs sharded off to all the child nodes. The mapping will occur by issuing Check over each pair, and the reduction will combine the results into a single set to be returned to the leader, who will then emit the updates into the watched stream as a single response.

2) Alternatively, the updates could be emitted as they are computed, with the final "we've reached this revision" update being emitted as a special entry in the stream. This would enable the Tiger Cache to partially update, which might be a desiriable effect.

Lookup Caching Service ("Tiger")

The Lookup Caching Service ("Tiger") will make use of the Lookup Watch API to provide a static cache of permissable resources (of a particular kind) to the specified kind of subject.

Tiger Cache API

The Tiger Cache will expose one API, RoaringLookupResources, for returning the full set of accessible resources of the requested kind, for the requested permission, for a specific subject, in Roaring Bitmap form.

rpc RoaringLookupResources(RoaringLookupResourcesRequest) returns (RoaringLookupResourcesResponse) {}

message RoaringLookupResources {
    Consistency consistency = 1;
    string resource_object_type = 2;
    string permission = 3;
    SubjectReference subject = 4;
}

message RoaringLookupResourcesResponse {
  bytes bitmap = 1;
  ZedToken cached_at = 2;
}
RoaringLookupResources and Consistency

Tiger Cache Implementation

The goal of the Tiger cache is to quickly compute the difference between its stored set of accessible resources and reapply that set to the overall tracked set.

To do so, the Tiger Cache will keep two database tables (per resource type/subject type pair):

TABLE {pairid}_tiger_map {
  subjectId INTEGER PRIMARY KEY
  bitmap BYTEA
  at_revision string
}

TABLE {pairid}_tiger_metadata {
  at_overall_revision string
  last_updated datetime
}
Bootstrapping

If the Tiger Cache is empty for a particular {resource_type, subject_type} pair, or the Lookup Watch API returns that the requested at_overall_revision has expired (i.e. that snapshot is no longer available), the Tiger Cache will perform a bootstrap operation:

1) Empty the map and metadata tables for the specific pair 2) Retrieve the latest ZedToken from SpiceDB 3) Run a ReadRelationships call with at_exact_snapshot with the ZedToken, to retrieve all the possible resource IDs 4) For each resource found, run LookupSubjects (at the same consistency as above) on the resources found, to collect the full set of subjects.[^2] 5) For each subject found, run a LookupResources (at the same consistency as above) call to find the set of currently accessible resources, writing the bitmap into the map table. 6) Update the at_overall_revision to the ZedToken used for the bootstrap operation. 7) Start the reconciliation loop.

[^2]: Alternatively, change ReadRelationships to support searching by subject type only

Reconciliation

On startup of the reconciliation loop, the Tiger Cache will call the Lookup Watch API, with the stored at_overall_revision, as well as the {resource type, subject type}.

The Lookup Watch API will then provide a stream of updates, indicating the addition and removal of permissions for specific resources of that type, for specific subjects.

For each resource in which a particular subject gains or loses access, the Tiger Cache will update the Roaring Bitmap, adding or removing the resource ID from the bitmap, changing the at_revision for that subject as well.

at_overall_revision and last_updated will be updated whenever the Tiger Cache has completed modifying the entire entry associated with an update from the Lookup Watch API.

Resolving RoaringLookup requests

To resolve lookup requests, the Tiger Cache will perform the following:

1) Check that the cache exists. If it does not, return an error indicating that it needs to be setup/bootstraped 2) Find the row corresponding to the incoming subject in the correct map table. a) If the row does not exist, then return an empty bitmap b) If the row exists, define comparision_revision as the greater of at_overall_revision and at_revision for the particular requested subject:

New Enemy and the Tiger Cache

To prevent New Enemy when making use of the Tiger Cache, two approaches seem viable:

1) Applications always request the data from the Tiger Cache with consistency at_least_as_fresh as the ZedToken for the overall parent namespace, and, if the cache returns an error, either wait for it to catch up or display the error to the user. 2) Request the data from the Tiger Cache with consistency minimize_latency, then compare the returned ZedToken[^3] to those stored for the resources returned. Only if the stored ZedToken for the resource is later than that in the cache, issue a Check to ensure the user still has access to that specific resource; all other resources can be treated as valid.

[^3]: This will require adding client library support for comparing the ordering of ZedTokens

Which approach is used is likely based on the application's requirements; we likely will want to document both.

Open Questions

1) Should the Lookup API and Tiger Cache be integrated?

Alternatives Considered

jonwhitty commented 3 years ago

The Lookup Watch API Service will make use of the Watch and the Reachability Resolution APIs to provide a stream of updates indicating changes in the accessibility of a kind of resources, for a specific kind of subject.

At Adobe we want to build access-aware secondary indexes for all resources for one or more permissions. The most common case is that we want a stream of updates indicating changes in accessibility for 'view' level access to any resource. For example, we have resources such as Projects, Tasks, and Documents and if someones 'view' access is revoked either directly or indirectly through revocation of 'edit' or 'manage' access, then we want to get an update indicating the revocation of 'view' access for all of these resources.

Putting the burden on the client to invoke a WatchAccessibleResources request per resource type would be non-ideal for the majority of the use cases that we want this API support for.

Additionally, I may recommend that the Watch API be repurposed to serve the functionality explained herein. When a client integrates with the Watch API, their intention is to get streaming updates of the relationships that they are presumably indexing.. To have a Watch API and a LookupWatch API that serve two different but conceptually similar purposes, that's just confusing. I might instead recommend implementing this functionality as a single RPC within a Watch API and allow the client to choose if they want a "flattened" or "expanded" update stream or if they just want a "direct" update stream without any of the updates to dependent relationships.

To ensure that the Lookup Watch API Service can operate at scale, it is recommended that Check operations be sharded out amongst a cluster of watch service nodes, with each reporting its status back to the leader.

The Watch API is served from spicedb alongside the Check API. And doesn’t the Check API implementation already shard the request among spicedb nodes? I’m not sure I’m following here.

Did you mean to say that the Lookup Watch API service will shard the Watch requests across spicedb nodes to account for the high degree of concurrent checks that are occurring with the reconciliation process? That would make more sense, and that is support that would need to be added to spicedb that does not already exist.

RoaringLookupResources and Consistency … ** fully_consistent will be disallowed.

This is a fair compromise due to the complexity of supporting this, but it does undermine the new-enemy problem that spicedb is striving so hard to account for. If we're willing to trade-off higher latency in these cases, could we not layer on the Lookup Watch API to stream real-time lookups at a more recent revision if the cache contents are stale? In Zanzibar they do this with the component of Leopard that does online live updates served by the Watch API - same idea here.

Should the Lookup API and Tiger Cache be integrated?

My opinion, no. I believe that this should be a component that can be enabled. If it is enabled the other components should leverage it to enhance performance where applicable. For example, in the Zanzibar paper the aclservers use Leopard to short-circuit lookups for flattened group relationships if the cached keys are at least as recent as the revision snapshot encoded in the zookie. If this component is enabled in spicedb I would expect spicedb to also leverage the index provided by the Tiger cache to optimize those lookups.

Should the LookupResources API in SpiceDB be wireable to the Tiger Cache, such that if it is available and at the correct revision, SpiceDB can just use it directly?

Yes. I believe this was the inspiration behind the Leopard indexing system in addition to what I mentioned above ^.

Should updates be batched all together following the map-reduce, or should they be streamed out to the watch API as they are computed by each child node?

To me it seems that as soon as the accessibility update computations are available they should be streamed back to clients interested in them. This approach will also reduce the burden of the accumulation step in the map-reduce approach, which could become quite cumbersome for very deeply or widely nested sets. Accumulating those aggregations and then streaming them back might have a large burden on the server for some datasets.

josephschorr commented 3 years ago

Putting the burden on the client to invoke a WatchAccessibleResources request per resource type would be non-ideal for the majority of the use cases that we want this API support for.

The problem is one of scale: The API calls needed to compute the updates will differ based on resource type and permission. We could provide a batch version of WatchAccessibleResources that operates over a provided set of {resource type, permission} pairs, but from a design perspective keeping the actual internals locked to specific resource#permission is very important, lest we run computation for resources or permissions not needed; most callers will likely need it for one permission per resource type, maybe.

Mixing different resource types in the same call also means that every response will need to contain additional context indicating to which resource#permission the update applies; a minor issue, but it does break some of our existing API conventions a bit.

Additionally, I may recommend that the Watch API be repurposed to serve the functionality explained herein. When a client integrates with the Watch API, their intention is to get streaming updates of the relationships that they are presumably indexing.. To have a Watch API and a LookupWatch API that serve two different but conceptually similar purposes, that's just confusing. I might instead recommend implementing this functionality as a single RPC within a Watch API and allow the client to choose if they want a "flattened" or "expanded" update stream or if they just want a "direct" update stream without any of the updates to dependent relationships.

Except that the amount of computation necessary to serve the LookupWatch API means that it cannot be considered the same API. The LookupWatch is not a filtered view of Watch: it is a derivative of the Watch API, combined with a number of other APIs (reachability, Check, etc). Furthermore, by requiring a significant amount of (distributed) computation, the LookupWatch API will scale at a drastically different rate than normal Watch, which is why this proposal has the LookupWatch running as its own distinct service.

From an API usability perspective, we could provide the LookupWatch in the "standard" API (and just error if it isn't enabled), but I feel strongly it should be its own API call and must still be driven by an "external" service.

The Watch API is served from spicedb alongside the Check API. And doesn’t the Check API implementation already shard the request among spicedb nodes? I’m not sure I’m following here.

Did you mean to say that the Lookup Watch API service will shard the Watch requests across spicedb nodes to account for the high degree of concurrent checks that are occurring with the reconciliation process? That would make more sense, and that is support that would need to be added to spicedb that does not already exist.

The LookupWatch API service will itself be sharded. When a relationship update comes in, the "leader" for the particular LookupWatch call will issue the reachability API requests, then send each pair of {resource,subject} amongst its "child" nodes, to have the Checks run in parallel, before the results are either collected and reported (option 1) or streamed as computed and then the final "all updates are complete" message is sent (option 2)

This is a fair compromise due to the complexity of supporting this, but it does undermine the new-enemy problem that spicedb is striving so hard to account for. If we're willing to trade-off higher latency in these cases, could we not layer on the Lookup Watch API to stream real-time lookups at a more recent revision if the cache contents are stale?

It actually doesn't break New Enemy at all, so long as either a ZedToken is sent (and the cache is compared to said token) OR the caller compares the ZedToken's themselves following a call with minimize_latency.

Consider the following scenarios:

1) The cache has revision 1 and the caller requests revision 1: no problem, fully up to date 2) The cache has revision 2 and the caller requests revision 1: no problem, already ahead 3) The cache has revision 1 and the caller requests revision 2: fail and the caller can decide how to proceed, whether to issue a LookupResources (which is real-time computed) or fallback to "give me what you have and I'll post Check" 4) The cache has revision 1, HEAD is 2, the caller requests minimize_latency: they receive back the data for revision 1 and the ZedToken containing revision 1. The caller then compares the ZedToken's for results they are filtering to the ZedToken containing revision 1. If the ZedToken stored for an item is less, then we're up-to-date for that resource. If the ZedToken stored for an item is greater, then the caller issues their own Check to be sure, and thus, prevents New Enemy.

In either case, the caller makes a decision on how to proceed: whether they want fast, but potentially requiring a small post filter, or they want to guarantee consistency to the point of a ZedToken.

If the caller doesn't care, and just wants it to "work", they can invoke the LookupResources API we already have, and it is guaranteed to work for all cases.

In Zanzibar they do this with the component of Leopard that does online live updates served by the Watch API - same idea here.

Yeah, but that's done AFAIK against their internal stored representation to update in place (as the Tiger Cache would as well); it isn't provided to the external caller (in fact, I believe the external caller can't access the cache at all). Here, the external caller is invoking the Tiger Cache's API.

We could match that behavior by having LookupResources use the Tiger Cache when available, and do a realtime lookup otherwise.

Should the Lookup API and Tiger Cache be integrated?

My opinion, no. I believe that this should be a component that can be enabled. If it is enabled the other components should leverage it to enhance performance where applicable. For example, in the Zanzibar paper the aclservers use Leopard to short-circuit lookups for flattened group relationships if the cached keys are at least as recent as the revision snapshot encoded in the zookie. If this component is enabled in spicedb I would expect spicedb to also leverage the index provided by the Tiger cache to optimize those lookups.

👍

Should the LookupResources API in SpiceDB be wireable to the Tiger Cache, such that if it is available and at the correct revision, SpiceDB can just use it directly?

Yes. I believe this was the inspiration behind the Leopard indexing system in addition to what I mentioned above ^.

Agreed. I like it being used if available.

Should updates be batched all together following the map-reduce, or should they be streamed out to the watch API as they are computed by each child node?

To me it seems that as soon as the accessibility update computations are available they should be streamed back to clients interested in them. This approach will also reduce the burden of the accumulation step in the map-reduce approach, which could become quite cumbersome for very deeply or widely nested sets. Accumulating those aggregations and then streaming them back might have a large burden on the server for some datasets.

Yeah, with the caveat that it would require firing a final message in the stream saying "we're fully done for this ZedToken".

jonwhitty commented 3 years ago

The problem is one of scale: The API calls needed to compute the updates will differ based on resource type and permission. We could provide a batch version of WatchAccessibleResources that operates over a provided set of {resource type, permission} pairs, but from a design perspective keeping the actual internals locked to specific resource#permission is very important, lest we run computation for resources or permissions not needed; most callers will likely need it for one permission per resource type, maybe.

Mixing different resource types in the same call also means that every response will need to contain additional context indicating to which resource#permission the update applies; a minor issue, but it does break some of our existing API conventions a bit.

A client such as myself will ultimately be invoking this API multiple times (concurrently) to watch a variety of resource types (e.g. Projects, Tasks, Documents, etc..). That is my use case. Handling the scale of that, how is that any different than providing me an API with semantics that streamline that API experience and the server give me back a stream of those updates per {resource type, permission} pairs? Essentially the "load" on the system will still be the same but now the system will be handling more network requests in total because of it. At the end of the day the processes are scaled out across multiple workers in the cluster of spicedb and LookupWatch servers, and that's where the scalability component comes from. I'm not sure I understand why it is critical to only support an API with semantics that only allow a single {resource type, permission} pair to be watched within a single request?

Except that the amount of computation necessary to serve the LookupWatch API means that it cannot be considered the same API. The LookupWatch is not a filtered view of Watch: it is a derivative of the Watch API, combined with a number of other APIs (reachability, Check, etc). Furthermore, by requiring a significant amount of (distributed) computation, the LookupWatch API will scale at a drastically different rate than normal Watch, which is why this proposal has the LookupWatch running as its own distinct service.

From an API usability perspective, we could provide the LookupWatch in the "standard" API (and just error if it isn't enabled), but I feel strongly it should be its own API call and must still be driven by an "external" service.

The Watch API just streams direct relationtuple changes (essentially a change-data stream off of the database table(s)). The LookupWatch API is a more complex process that involves a lot more moving parts and scalability properties by distributing requests across more LookupWatch servers. So I see where you're coming from.

But.. What would be nice is for there to exist a Watch server (call it spicewatcher or something) that hosts a Watch RPC and if the client wants to watch an "expanded" set of relationship changes then the implementation will execute the distributed process you've described in this proposal. If the client just wishes to get a stream of the direct relationship changes then the RPC handler implementation will simply delegate to the functionality that already exists in the Watch RPC implementation.

Something like this (pseudocode)

func (c *Controller) Watch(req *pb.WatchRequest, pb.WatchService_WatchServer) error {
    if req.GetType() == pb.Expanded {
        ... // fire off the distributed watch process that implements the proposal described herein
    } else {
        ... // delegate to code like it is today in the Watch handler
    }
}

Essentially I'm proposing that Watch moves out of the core spicedb API and into a seperate service (spicewatcher) that will serve either the Watch RPC and LookupWatch RPC OR a singular Watch RPC that can serve both direct and expanded forms like I described above. I feel like this API and service topology would make more sense from a client integration perspective.

The LookupWatch API service will itself be sharded. When a relationship update comes in, the "leader" for the particular LookupWatch call will issue the reachability API requests, then send each pair of {resource,subject} amongst its "child" nodes, to have the Checks run in parallel, before the results are either collected and reported (option 1) or streamed as computed and then the final "all updates are complete" message is sent (option 2)

Got it, that makes sense. I like that approach a lot. It distributes the reachability and reconciliation process across multiple workers to handle the scale and continues to leverage the caching properties of the overall architecture. 👍 from me

Yeah, but that's done AFAIK against their internal stored representation to update in place (as the Tiger Cache would as well); it isn't provided to the external caller (in fact, I believe the external caller can't access the cache at all). Here, the external caller is invoking the Tiger Cache's API.

We could match that behavior by having LookupResources use the Tiger Cache when available, and do a realtime lookup otherwise.

I got the same impression from the paper. They specifically say that they only use Leopard to short-circuit group membership lookups for namespaces that exhibit large levels of indirection, which kind of implies that it's an internal use-case used by aclservers and not a public API for clients within Google to use.

I do like the idea of using the Tiger cache for LookupResources if the cache has it and otherwise falling back to the Lookup API. I think that should be a planned feature to minimize tail latency at scale.

It actually doesn't break New Enemy at all, so long as either a ZedToken is sent (and the cache is compared to said token) OR the caller compares the ZedToken's themselves following a call with minimize_latency.

Consider the following scenarios:

  • The cache has revision 1 and the caller requests revision 1: no problem, fully up to date
  • The cache has revision 2 and the caller requests revision 1: no problem, already ahead
  • The cache has revision 1 and the caller requests revision 2: fail and the caller can decide how to proceed, whether to issue a LookupResources (which is real-time computed) or fallback to "give me what you have and I'll post Check"
  • The cache has revision 1, HEAD is 2, the caller requests minimize_latency: they receive back the data for revision 1 and the ZedToken containing revision 1. The caller then compares the ZedToken's for results they are filtering to the ZedToken containing revision 1. If the ZedToken stored for an item is less, then we're up-to-date for that resource. If the ZedToken stored for an item is greater, then the caller issues their own Check to be sure, and thus, prevents New Enemy.

In either case, the caller makes a decision on how to proceed: whether they want fast, but potentially requiring a small post filter, or they want to guarantee consistency to the point of a ZedToken.

If the caller doesn't care, and just wants it to "work", they can invoke the LookupResources API we already have, and it is guaranteed to work for all cases.

👍 ah I see, that's a great description and that makes sense. I like it!

josephschorr commented 3 years ago

A client such as myself will ultimately be invoking this API multiple times (concurrently) to watch a variety of resource types (e.g. Projects, Tasks, Documents, etc..). That is my use case. Handling the scale of that, how is that any different than providing me an API with semantics that streamline that API experience and the server give me back a stream of those updates per {resource type, permission} pairs? Essentially the "load" on the system will still be the same but now the system will be handling more network requests in total because of it. At the end of the day the processes are scaled out across multiple workers in the cluster of spicedb and LookupWatch servers, and that's where the scalability component comes from. I'm not sure I understand why it is critical to only support an API with semantics that only allow a single {resource type, permission} pair to be watched within a single request?

You'd still have to be explicit about for which types of resources, and the permissions for them, you require updates. This way, we're only computing the minimal set of information. As I said above, no problem making it into a single call, but it should (and would) still be handled on a per resource_type#permission basis.

The Watch API just streams direct relationtuple changes (essentially a change-data stream off of the database table(s)). The LookupWatch API is a more complex process that involves a lot more moving parts and scalability properties by distributing requests across more LookupWatch servers. So I see where you're coming from.

But.. What would be nice is for there to exist a Watch server (call it spicewatcher or something) that hosts a Watch RPC and if the client wants to watch an "expanded" set of relationship changes then the implementation will execute the distributed process you've described in this proposal. If the client just wishes to get a stream of the direct relationship changes then the RPC handler implementation will simply delegate to the functionality that already exists in the Watch RPC implementation.

We could design a library and implement the Lookup Watch API based on that library. It might be useful to others who need to do distributed computation based on Watch, and maintain the same semantics.

Essentially I'm proposing that Watch moves out of the core spicedb API and into a seperate service (spicewatcher) that will serve either the Watch RPC and LookupWatch RPC OR a singular Watch RPC that can serve both direct and expanded forms like I described above. I feel like this API and service topology would make more sense from a client integration perspective.

I think Watch forms the foundation of any such derived service; I still think SpiceDB itself will likely need to have the core Watch API, as it is the only service with direct access to the datastore, which is required to implement the Watch API.

jonwhitty commented 3 years ago

You'd still have to be explicit about for which types of resources, and the permissions for them, you require updates. This way, we're only computing the minimal set of information. As I said above, no problem making it into a single call, but it should (and would) still be handled on a per resource_type#permission basis.

Gotcha. Yeah, the client can still provide an [ {resource type, permission} ] pairs, but the server with handle that list and give back a single stream. I like that approach 👍

I think Watch forms the foundation of any such derived service; I still think SpiceDB itself will likely need to have the core Watch API, as it is the only service with direct access to the datastore, which is required to implement the Watch API.

I don't see why an external Watch service couldn't also create a read-only connection to the changelog of the persistent store and then serve the Watch API outside of spicedb. That's what I was thinking when I proposed that alternative. And, when combined with the library approach you mentioned, that'd be a great approach!

I like that idea a lot. The Watch functionality, just like the Tiger cache, might not even be needed by clients. This is one of my reasonings behind this.

josephschorr commented 3 years ago

I don't see why an external Watch service couldn't also create a read-only connection to the changelog of the persistent store and then serve the Watch API outside of spicedb. That's what I was thinking when I proposed that alternative. And, when combined with the library approach you mentioned, that'd be a great approach!

Isolation and separation of concern, mostly. Its better for only SpiceDB to have the config and secrets necessary to talk to the datastore and for the other systems to be built on top

jon-whit commented 3 years ago

@josephschorr I'm realizing we'll also need an ObjectReference in the PermissionUpdate message. This way the client can index the permission change for the specific subject for the given resource. The objectType of the resource will obviously match the object type from the RPC request, but the objectId will provide the client the ability to index the new permission based on the change.

message PermissionUpdate {
    SubjectReference subject = 1;
    ObjectReference resource = 2;
    Permissionship updated_permission = 3;
}

What do you think?

josephschorr commented 3 years ago

@jon-whit Yeah, that's fine. You'll also need the relation then too. I originally let it out because originally the caller was specifying the object information, but now that the API will return it for all object types requested, we need the info here.

gonzalad commented 2 years ago

ALL resource IDs in the system must be 64-bit integers

Sorry, this means we won't be able to use UUIDs (128 bits) for those resources ?

josephschorr commented 2 years ago

@gonzalad If you intend to use the tiger cache with roaring bitmaps, correct.

gonzalad commented 2 years ago

Thanks for the fast answer @josephschorr ! (I'm a bit sad though... lol)

josephschorr commented 2 years ago

@gonzalad If you can find a stable and supported 128-bit roaring bitmap implementation in Go and other languages, we can remove that restriction

chancesm commented 1 year ago

Our ids in our system are currently uuids. An implementation that would allow us to keep that (alternative to a roaring bitmap) would be preferred. 😄

mohsen3 commented 1 year ago

According to the roaring bitmap docs, even the 64-bit version is not guaranteed to be compatible across different languages:

Only the 32-bit roaring format is standard and cross-operable between Java, C++, C and Go. There is no guarantee that the 64-bit versions are compatible.

I wonder revealing such thing through API is going to be very helpful.

josephschorr commented 1 year ago

@mohsen3 Given the restrictions of roaring bitmaps, we're unsure if they will be part of the API

henkosch commented 9 months ago

Any news on this proposal?

henkosch commented 9 months ago

@josephschorr

For our use case (https://github.com/authzed/spicedb/issues/280) it would be enough to have the LookupWatch API implemented so we could stream the permission changes to our consumer services and they could use whatever format they want to cache them.

I think using the roaring bitmaps is an implementation detail and should not be part of the offered solution, or should be optional. As many others mentioned before in this thread, there are various ID formats being used, like strings and UUID's that would not be compatible with the roaring bitmaps so that solution would be too constraining for many.

Although being able to get an event stream of the affected permissions due to a relation change would be essential. We would probably use a messaging system like pulsar to stream the changes to different topics and all consuming services could subscribe to those streams. Locally the services could build their own permission cache in the database so they could use it along with their local queries for filtering based on visibility permissions.

We would only use this for listing, filtering and searching resources, so eventual consistency is fine. When opening the document we would ask spicedb directly which would still protect us from the new enemy problem.

Is there already an API or tooling that we could use to generate the affected permissions from the list of changed relations coming from the Watch API?

jzelinskie commented 6 months ago

I wanted to give an update on this proposal.

We've spent a lot of time investigating this approach and found that it doesn't reach the scale to truly solve this problem long term. As a result, we took a different approach entirely when building our Materialize product, which is currently proprietary and sold as a managed service.

In full transparency, building Materialize is/was a non-trivial amount of R&D and something we'd like to use as an incentive to support our business. We've yet to see any examples of other solutions (proposed or implemented) with an approach that reaches the order of magnitude of scale we're currently supporting. Materialize is still in active development as we apply it to new use cases we're able to learn more about operating it and making dramatic simplifications to its architecture.

That being said, our company philosophy is that these types of critical systems need to be open source. We've developed the project with the ultimate goal of open sourcing when the time is right.

I think the question now is how would the community like to see Materialize's scalable APIs integrated into the SpiceDB ecosystem?

There will be users of SpiceDB that will not want to operate Materialize because of the additional complexity. In that case should SpiceDB serve these APIs, but in a significantly slower form? That has the benefits of avoiding rewriting applications, but also confusion when these APIs do not perform as expected.

n3ziniuka5 commented 5 months ago

There will be users of SpiceDB that will not want to operate Materialize because of the additional complexity. In that case should SpiceDB serve these APIs, but in a significantly slower form? That has the benefits of avoiding rewriting applications, but also confusion when these APIs do not perform as expected.

We've just recently deployed SpiceDB, but so far everything is going smoothly. If deploying/operating Materialize is going to be as simple as with current SpiceDB Operator, then I would happily take the small additional complexity and have the peace of mind that the solution will scale.

Figedi commented 5 months ago

In full transparency, building Materialize is/was a non-trivial amount of R&D and something we'd like to use as an incentive to support our business. We've yet to see any examples of other solutions (proposed or implemented) with an approach that reaches the order of magnitude of scale we're currently supporting. Materialize is still in active development as we apply it to new use cases we're able to learn more about operating it and making dramatic simplifications to its architecture.

Just wondering, as maybe even the name might be overlapping. openfga has in active r&d a similar solution using materialize.com to solve this problem. You can see it in action here: https://www.youtube.com/watch?v=WE7-WeiRvwI&t=2439s

There will be users of SpiceDB that will not want to operate Materialize because of the additional complexity. In that case should SpiceDB serve these APIs, but in a significantly slower form? That has the benefits of avoiding rewriting applications, but also confusion when these APIs do not perform as expected.

Right now not having materialize is for us a blocker for using spicedb. Currently when using the spicedb-operator, you already have to manage quite some stuff (spicedb cluster + db cluster), so, if the materialize-arch is not vastly more complex, i suspect the overhead is not too different to what is already there (this of course depends on your actual implementation, I cannot tell much there). In the case for the openfga version, you'd have to buy a managed materialize.com cluster (as they don't offer self-hosting) and have some CDC-service running (ie debezium), so the overhead to it is relatively little and worth it in my opinion

alechenninger commented 5 months ago

In the case for the openfga version, you'd have to buy a managed materialize.com cluster (as they don't offer self-hosting) and have some CDC-service running (ie debezium), so the overhead to it is relatively little and worth it in my opinion

I think the comparison to materialize.com/OpenFGA is not quite apples-to-apples though. If it's okay to require buying a managed service, then for that matter you could buy AuthZed and have a similarly easier time, as I understand it at least.

I think the question now is how would the community like to see Materialize's scalable APIs integrated into the SpiceDB ecosystem?

There will be users of SpiceDB that will not want to operate Materialize because of the additional complexity. In that case should SpiceDB serve these APIs, but in a significantly slower form? That has the benefits of avoiding rewriting applications, but also confusion when these APIs do not perform as expected.

Overall, big +1 to open source infrastructure. That said, for our part, whether we would use an open version of materialize would depend on the details here in terms of how much additional complexity and how much additional performance gain.

Is the performance / complexity specifically about optimized lookup resources/subjects endpoints (what I think of as the tiger cache)? Or also the ability to denormalize permissions to other data stores (e.g. postgres for local joins)? I think of these as separate but I'm not sure if that's right.

For what it's worth, we are trying to reduce complexity, generally, and have currently opted for less granular permissions in the event that the scale is high enough that conventional pre/post filtering patterns don't work. For example, if you imagine a Google Drive/Docs model, for those very high cardinality resources we might only allow access controls at the Folder level instead of the individual resource level. We intend to revisit this in the future though. I imagine if the complexity of running an open version of materialize were high enough, we might continue to do the same, depending.

jzelinskie commented 5 months ago

Clarifying here before we give the product docs for Materialize an update:

Materialize is new service designed from the ground up to scalably compute real-time changes to fully denormalized permissions modeled in SpiceDB. Materialize has no dependencies on other products or services (except for SpiceDB, of course). We took this approach because it's the most faithful to Google's original Leopard design and we determined it was the only way to provide the best experience.

Existing SQL technologies for streaming materialized views are great for demoing the fundamental idea, but have major downsides that led us to not pursue their use:

Our implementation for Materialize uses the same foundation to do two things:

The new permission sets API is used to efficiently maintain secondary indexes in other systems (e.g. Postgres for local joins, native filtering for elastic). This API is what we'd like to discuss how to best integrate with the community.

There will always be some portion of folks uninterested in running anything other than SpiceDB. There's a difficult decision to be made: should SpiceDB implement these APIs even if they can never be particularly efficient? Doing so would offer a guaranteed set of APIs, regardless of setup, with the downside of additional complexity in SpiceDB for a subpar experience.

We went out on a limb early on when we invented the LookupResources/LookupSubjects APIs, but we feel confident over time we can keep improving their efficiency to useful for many use cases. However, I'm not personally sure that the permission sets APIs provide value with the level of performance we'd be able to implement in SpiceDB.

A first good step we could take would be to share these APIs in the APIs repository, but under a different protobuf package.

F21 commented 4 months ago

For the curious, the API is here: https://github.com/authzed/api/tree/main/authzed/api/materialize/v0