StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.94k stars 1.79k forks source link

Please Offer SQL DDL to Clean Up the Cache On-Demand #51030

Open xiaobingxia-at opened 1 month ago

xiaobingxia-at commented 1 month ago

Feature request

Is your feature request related to a problem? Please describe. It's great to see that the latest version of Starrocks can run a SQL statement to pre-warm up the cache. But because of security reason, we don't want to leave any footprint of data at anywhere, unless it is encrypted. Data at cache is not encrypted, so we want to remove the cache from memory on demand.

Describe the solution you'd like Please offer a SQL solution like, "CLEAR CACHE FOR {TABLE XXX}", so user can execute and clear the cache.

Describe alternatives you've considered

Additional context

wangsimo0 commented 1 month ago

Hi xiaobingxia, thanks for your advice! just want to fully understand your requirement, how is your data encrypted? are you using hive or iceberg table and encrypted orc or something? I just curious of how the whole process is

xiaobingxia-at commented 1 month ago

So we want all data persisted on disk / s3 to be encrypted. But we also want to leverage the caching. So the process here is:

  1. We pre-warm up the cache
  2. We do queries towards cache (unencrypted) + tables on s3 (encrypted)
  3. We clear the cache. We can use the cache data without being encrypted, we just need to clean it on time.
wangsimo0 commented 1 month ago

I'm not quite understand. if you are using sr's data cache, it should only cache the original data, which in your case is encrypted.

xiaobingxia-at commented 1 month ago

To clarify, we have SR persist data on S3 bucket. And then we apply the s3 server side encryption: https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html , which means, data on s3 is encrypted by S3. When any client like SR writes data to S3, S3 will encrypt data first, when SR fetching data from S3, S3 will decrypt data first and then return the data to the SR. So SR will get the decrypted data.

wangsimo0 commented 1 month ago

thanks for your explanation! In this case starrocks need to provide a evict cache SQL API. This is a good suggestion.