Open henkosch opened 2 years ago
Hi @henkosch, your two listed options are the recommendations today: pre and post filtering.
As you mention, however, they can become quite burdensome at much larger scales or where filtering is tied to external metadata. To solve that problem, we're currently working on this proposal: https://github.com/authzed/spicedb/issues/207.
In summary, it will provide an API called LookupWatch
which can be used to actively monitor the changes in resources to which a subject has access, and an API called RoaringLookupResources
, which will return a cached roaring bitmap representing that access. Once available, the expectation is that a caller can then retrieve the roaring bitmap from the cache, and then pass it as an additional filter to something like ElasticSearch, thereby providing the ability to do an ACL-aware filtered search.
Hi! I've stumbled upon this discussion which gauged my interest pretty quickly - ACL-aware filtered search is the biggest pain point I've seen in most permissions systems so far.
I've particularly liked the idea of the roaring bitmap - Reading up on #207 and roaring bitmaps (which made some fun weekend reads), I've got some questions to know on how you think this might work / not work.
I've also spent some time reading up on roaring bitmaps (again a fun weekend read!) and came up with the following potential obstacles - I'm curious if you've put these into consideration for the roaring bitmap support?
Thanks ahead! (And thanks as well for the fabulous work already put into SpiceDb, I've only discovered it a few days ago and it's already been a great joy working with it!)
- Do you know of any search engines/databases with native roaring bitmap querying support? I've started to research roaring bitmap support for ElasticSearch - While it's easy to find that Lucene is using roaring bitmaps internally, it's been less obvious to find roaring bitmap support for query parameters. It mostly boils down to a custom plugin for ElasticSearch, implementing a skip-list like behaviour - like this one https://luis-sena.medium.com/improve-elasticsearch-filtering-performance-10x-using-this-plugin-8c6485516c1a
Native has been hard to find, unfortunately. There are, however, numerous plugins available, such as this one for Postgres: https://pgxn.org/dist/pg_roaringbitmap/.
I've also spent some time reading up on roaring bitmaps (again a fun weekend read!) and came up with the following potential obstacles - I'm curious if you've put these into consideration for the roaring bitmap support?
Proposal: Lookup Watch API and Tiger Cache for fast ACL-aware filtering #207 suggests 64-bit integers for the resource IDs.
- Most implementations of roaring bitmaps are on 32-bit integers
- Only some implementations have 64-bit integer support
- There's no standardized serialization format for 64-bit roaring bitmaps Define serialisation format for 64 bit bitmaps RoaringBitmap/RoaringBitmap#351
- Only the Java implementation has support on the faster implementation with ART data structure; A novel 64bit bitmap implementation RoaringBitmap/RoaringBitmap#403
Yeah, its been a consideration as well. The idea of using the roaring bitmap at all is to provide the most efficient means of sending the full list of accessible resource IDs to the client; we've also had discussions of providing other kinds of similar caches (all built on top of the same lookup watch API), with different data types if necessary. For example, a bloom filter might be another idea, although it has its own concerns.
I decided to start with roaring bitmap on the theory that, at worst, its fairly easy to convert into other forms if necessary
@jzelinskie Regarding your answer in the LookupWatch API draf implementation PR, here is our use case described at the beginning of the thread. We are still looking for a way to filter our entities based on permissions and other search criteria at the same time. It would require us to stream permission changes to our consumer services so they can do the filtering in their own databases.
https://authzed.com/blog/materialize-early-access is now available for Early Access to solve this problem for very large scales
https://authzed.com/blog/materialize-early-access is now available for Early Access to solve this problem for very large scales
Will this be available on prem? When will more details be made available? Not having this kind of solution available is a blocker for us even considering SpiceDB. Does this solution require us to use PostgreSQL as the datastore for the read models of our domain?
@Krustie101 early access is only in the dedicated offering, we will explore on-prem as part of the SpiceDB Enterprise on-prem offering after that. For now, it supports CockroachDB, and PostgreSQL is the next in line. What datastore are you using?
@vroldanbet Thanks for the reply. The datastore for our business domain read and write models is currently SQLServer but PostgreSQL will be our preferred datastore for our new bounded contexts. I assumed that the choice of data store for our business domain entities and read projections (we use CQRS) did not matter, but that we need to choose CockroachDB or PostgreSQL for the SpiceDB datastore.
Does this change if we want to use Authzed Materialize? If we choose PostgreSQL for our SpiceDB datastore, do we also need to choose PostgreSQL to store our business domain read projections. This would not be a blocker, but I am curious to know if and why this is the case. Has it something to do with "roaring bitmap" support in PostgreSQL?
I assumed that the choice of data store for our business domain entities and read projections (we use CQRS) did not matter, but that we need to choose CockroachDB or PostgreSQL for the SpiceDB datastore.
That is correct, specially with CQRS, were you will treat SpiceDB as another projection of your business data.
Does this change if we want to use Authzed Materialize? If we choose PostgreSQL for our SpiceDB datastore, do we also need to choose PostgreSQL to store our business domain read projections. This would not be a blocker, but I am curious to know if and why this is the case. Has it something to do with "roaring bitmap" support in PostgreSQL?
No. Materialize provides streaming APIs you listen to and then write back to any database you choose. We are exploring deeper integrations with various data storage engines in the future, but for now the customer is responsible to ingest the output from Materialize.
We have started implementing a permission system using spicedb, but we are tackling with the following theoretical question. Suppose we have documents and users. The documents are organized in folders and some users have permission to view documents in some folders. The use case in question is to search for documents using a full text search or some other criteria, but taking into account that you would only need to search the ones that the user has view permissions on.
Something like:
I can imagine 2 strategies:
Both of these seem suboptimal and could result in a very expensive operation of potentially moving the whole database between systems. Are there any guidelines, docs or cases studies that we could read? Thank you!