StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.93k stars 1.79k forks source link

Cache related questions in storage computing separation mode #48277

Open zhbdesign opened 4 months ago

zhbdesign commented 4 months ago

Enhancement

存算分离模式下缓存相关疑问: 1.各种数据导入方式在导入数据时是否都会全量进入缓存吗 2.如果是分区表,并且设置了datacache.partition_duration,导入的时候是只缓存有效期内的吗,还是全量缓存 3.查询的时候,对于非分区表,缓存的是查询条件内的有效数据,还是全表数据 4.查询的时候,对于分区表,缓存的是datacache.partition_duration内有效数据,还是按照插入时间进行缓存

Cache related questions in storage computing separation mode:

  1. Will all data import methods fully enter the cache when importing data
  2. If it is a partitioned table and datacached. artition-d uration is set, will only the valid period be cached during import, or will it be fully cached When querying, for non partitioned tables, is the cached valid data within the query criteria or the entire table data
  3. When querying, for partition tables, is the cached valid data in datacache.partition-d uration or is it cached based on insertion time
kevincai commented 4 months ago
  1. depends on the table parameters, the datacache.enable property
  2. yes, full data will be cached during the ingestion
  3. depends on the scanning phase, data that has been scanned will enter into cache
  4. calculated based on partition key column.