DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.88k stars 126 forks source link

Can only fetch first run id from cache using `SQLiteMetadataStore` object #1204

Closed elutins closed 1 month ago

elutins commented 1 month ago

Cache only returns the first run_id that populated the cache when calling get_run_ids() on a hamilton.caching.stores.sqlite.SQLiteMetadataStore object

Current behavior

Stack Traces

Believe this is caused by this line here in hamilton/caching/stores/sqlite.py in which the return result object is being indexed to the first item only

Steps to replicate behavior

  1. instantiate a driver object using with_cache()
    driver = (
    hamilton.driver.Builder()
    .with_config()
    .with_modules({some_module})
    .with_cache()
    .build()
    )
  2. execute the driver driver.execute(final_vars=[some_var], inputs={some_input})
  3. execute the driver again grabbing a different final_var: driver.execute(final_vars=[some_other_var], inputs={some_input})
  4. run hamilton.caching.stores.sqlite.SQLiteMetadataStore(".hamilton_cache").get_run_ids() to get the run_ids that populated cache
    • the expected/wanted behavior is that this command would return all run_ids - not just the first

Library & System Information

Expected behavior

Additional context

Add any other context about the problem here.

skrawcz commented 1 month ago

@elutins thanks for flagging. We'll get a fix out today.

skrawcz commented 1 month ago

@elutins this has been fixed in sf-hamilton==1.81.2. Thanks for raising! (otherwise please re-open if that's not the case).