georgia-tech-db / evadb

Database system for AI-powered apps
https://evadb.ai/docs
Apache License 2.0
2.63k stars 262 forks source link

Cache does not work for functions when there are third party data sources #1255

Open xzdandy opened 12 months ago

xzdandy commented 12 months ago

Search before asking

Bug

Query:

SELECT name, followers, LLMBatch("S", lang) FROM github_data.stargazers
JOIN LATERAL WebPageTextExtractor(name) AS web(text)
JOIN LATERAL LLMExtractor("P", text) AS golden(lang)
WHERE followers > 100
LIMIT 10;

Cache enabled:

Error message:

Traceback (most recent call last):
  File "/home/zxu330/eva/playground3.py", line 160, in <module>
    res = cursor.query(query).df()
  File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 110, in df
    batch = self.execute()
  File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 120, in execute
    result = execute_statement(self._evadb, self._query_node.copy())
  File "/home/zxu330/eva/evadb/server/command_handler.py", line 46, in execute_statement
    physical_plan = plan_generator.build(logical_plan)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 110, in build
    plan = self.optimize(logical_plan)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 101, in optimize
    self.execute_task_stack(optimizer_context.task_stack)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 48, in execute_task_stack
    task.execute()
  File "/home/zxu330/eva/evadb/optimizer/optimizer_tasks.py", line 240, in execute
    for plan in after:
  File "/home/zxu330/eva/evadb/optimizer/rules/rules.py", line 279, in apply
    new_func_expr = enable_cache(context, before.func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 293, in enable_cache
    cache = enable_cache_init(context, func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 262, in enable_cache_init
    optimized_key = optimize_cache_key(context, func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 254, in optimize_cache_key
    optimized_keys += optimize_key_mapping_f[type(key)](context, key)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 205, in optimize_cache_key_for_tuple_value_expression
    for col in get_table_primary_columns(table_obj):
  File "/home/zxu330/eva/evadb/catalog/catalog_utils.py", line 172, in get_table_primary_columns
    if table_catalog_obj.table_type == TableType.VIDEO_DATA:
AttributeError: 'NoneType' object has no attribute 'table_type'

Environment

Are you willing to submit a PR?

bygo7 commented 11 months ago

@xzdandy Can you provide the full code snippet for this including your db setup?

xzdandy commented 11 months ago

Hi @bygo7, please check https://github.com/georgia-tech-db/evadb/blob/xzdandy/playground3.py. You need to input your own GitHub token.

You need to enable cache for WebPageTextExtractor by adding that to https://github.com/georgia-tech-db/evadb/blob/staging/evadb/constants.py#L20

bygo7 commented 10 months ago

Hi @xzdandy, can you please provide some detail on what this cache is supposed to do?

xzdandy commented 10 months ago

Hi @bygo7, this is an exact caching. So when you provide the same input, it will skip the function evaluation and directly return the results. In this case, when we run the query the second time, it should skip the evaluation of WebPageTextExtractor.