aryn-ai / sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
https://sycamore.readthedocs.io
Apache License 2.0
366 stars 43 forks source link

Add support for caching intermediate results of Luna queries. #850

Closed mdwelsh closed 1 month ago

mdwelsh commented 1 month ago

This PR adds support for caching the intermediate results of Luna queries using .materialize().

The idea is that each node in a query plan is materialized to a separate directory named according to the hash of the logical node's contents. By "contents", we mean the node type, parameters, and (critically) dependencies, but NOT the node's description or node ID, which can change from query to query but not affect the semantic equivalence.

This definitely speeds up queries; I need to do more benchmarking but I am seeing substantial savings for queries that require a lot of cacheable computation.