Closed spena closed 4 years ago
Awesome!
Can we also remove the config we introduced to guard this? ksql.query.pull.skip.access.validator
as well?
What happens if this cache reaches the max.entries? We will start checking every pull query?
Just updated the description @vinothchandar to answer your questions.
Thanks! Hopefully, we can have a large default say 10000 that will prevent the user from getting into the state of checking for auth each pull query (assume each entry in cache is related to an underlying kafka topic)
Feature is completed.
Is your feature request related to a problem? Please describe. The new ksqlDB pull queries feature does not work if Kafka ACLs or some sort of external KSQL authorization is enabled. This has a functional limitation in ksqlDB for users who secure their Kafka environments.
For more context: ksqlDB performs some permissions checks in Kafka prior to executing a query, or any other statement that requires access to Kafka, so ksqlDB can know in advance if such a statement will be denied, and if so, provide a good error message for the access denied.
Pull queries do not access Kafka because the data is stored in rocksDB, but they are based on source topics that are stored in Kafka. So, ksqlDB needs to verify if ksqlDB will have the authorization to access the data. However, such authorization check every time a pull query is executed impacts the performance of pull queries, so the decision of disabling when authorization is enabled.
Describe the solution you'd like Implementing a cache that keeps Kafka authorization responses in memory will alleviate the performance problem for some amount of time. On every pull query request, ksqlDB will check against this cache if a pull query is authorized to access the data. After some time, ksqlDB will check against the real external authorization to refresh the cache and keep the user updated about new authorization rules.
The cache key is a string conformed of the username, topic name, and permission request. The cache value is a boolean of the authorization response, either access allowed (true) or denied (false). i.e
This cache will be configurable by 2 properties:
ksql.authorization.cache.expire.time
which will contain the amount of time to keep the authorization response in memory. After this time pass, ksqlDB will refresh the cache (default 30 seconds).ksql.authorization.cache.max.entries
which will contain the number of entries to keep in the cache. This controls the size of the cache to avoid consuming all memory of the system (default 1000 entries). If the cache hits the max.entries, then a ksqlDB will call the real authorization service. This will impact performance for some queries that cannot get authorization responses on the cache. Users can tweak this property if they experience performance problems.Also, this improvement should remove the current property
ksql.query.pull.skip.access.validator
which skips the authorization checks when pull queries are used.PRs
Describe alternatives you've considered There are no alternatives considered yet.
Additional context Add any other context or screenshots about the feature request here.