Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

Table Level Cache TTL (Metadata) support #13819

Open BlueStalker opened 3 years ago

BlueStalker commented 3 years ago

Is your feature request related to a problem? Please describe. As mentioned in https://github.com/Alluxio/alluxio/issues/13818, we want to use the cache efficiently, Consider each Table, only part of the data need to be cached (EG, latest X partitions, files updated in last Y days, ), and for the existing cache, there also need a mechanism to invalidate (evict) cache automatically based on above cache scheme.

Describe the solution you'd like The local cache Library will be able to accept a metadata configuration, and there is a background cache evictor running to do the active cache eviction, corresponding stats also need to be added on that. In or out of scope in this request, there is also an API to understand per caching node(lib), what is the breakdown table stats on cache. For example A class like: CacheStats { Map<Table, Stat> : stats Stat { Total Used, Total Provisioned, Total Hit, Total Missed.. } }

Describe alternatives you've considered There is potentially a solution to make sure the partition data into a particular query engine (given Presto as an example),

Urgency N/A

Additional context https://github.com/Alluxio/alluxio/issues/13818

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.