confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
41 stars 1.04k forks source link

Pull Queries : Implement a cache for Kafka authorization checks #4158

Closed spena closed 4 years ago

spena commented 4 years ago

Is your feature request related to a problem? Please describe. The new ksqlDB pull queries feature does not work if Kafka ACLs or some sort of external KSQL authorization is enabled. This has a functional limitation in ksqlDB for users who secure their Kafka environments.

For more context: ksqlDB performs some permissions checks in Kafka prior to executing a query, or any other statement that requires access to Kafka, so ksqlDB can know in advance if such a statement will be denied, and if so, provide a good error message for the access denied.

Pull queries do not access Kafka because the data is stored in rocksDB, but they are based on source topics that are stored in Kafka. So, ksqlDB needs to verify if ksqlDB will have the authorization to access the data. However, such authorization check every time a pull query is executed impacts the performance of pull queries, so the decision of disabling when authorization is enabled.

Describe the solution you'd like Implementing a cache that keeps Kafka authorization responses in memory will alleviate the performance problem for some amount of time. On every pull query request, ksqlDB will check against this cache if a pull query is authorized to access the data. After some time, ksqlDB will check against the real external authorization to refresh the cache and keep the user updated about new authorization rules.

The cache key is a string conformed of the username, topic name, and permission request. The cache value is a boolean of the authorization response, either access allowed (true) or denied (false). i.e

Does User:ksql have Read permissions on the pageviews topic?
key = ksql-pageviews-read
value = true

This cache will be configurable by 2 properties:

Also, this improvement should remove the current property ksql.query.pull.skip.access.validator which skips the authorization checks when pull queries are used.

PRs

Describe alternatives you've considered There are no alternatives considered yet.

Additional context Add any other context or screenshots about the feature request here.

vinothchandar commented 4 years ago

Awesome!

Can we also remove the config we introduced to guard this? ksql.query.pull.skip.access.validator as well?

What happens if this cache reaches the max.entries? We will start checking every pull query?

spena commented 4 years ago

Just updated the description @vinothchandar to answer your questions.

vinothchandar commented 4 years ago

Thanks! Hopefully, we can have a large default say 10000 that will prevent the user from getting into the state of checking for auth each pull query (assume each entry in cache is related to an underlying kafka topic)

spena commented 4 years ago

Feature is completed.