Open reisepass opened 7 months ago
I've been talking with the team about using caching for this kind of thing, but we haven't had a user doing this kind of workload. I think there are (or could be) a lot of options here, depending on what you're trying to do, at varying levels of complexity/work. Would love to hear more about what you're building! Feel free to drop some time on my calendar if you like.
Description
On the topic of caching in glaredb has it already implemented caching of frequently queried parquet blocks in memory or in fast disk near the compute. In the java world external blob storage caching systems exist like https://github.com/Alluxio/alluxio which then can provide in memory access directly to the Spark or Trino processes.
Starburst also has good caching build into their cloud version of Trino. It is almost fast enough to use it as a back-end for REST api's pulling data from parquet but still the Java overhead is hard to swallow when you can accomplish this so simply with pyarrow.
Context: We are looking for a solution to enable efficient small queries from large numbers of concurrent read only users without the need of copying the data once again to postgres/clickhouse