DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.72k stars 111 forks source link

'database is locked' error when using CachingGraphAdapter on Azure Linux ML Compute #1058

Closed giladrubin1 closed 1 month ago

giladrubin1 commented 1 month ago

Current behavior

When trying to use hamilton.experimental.h_cache.CachingGraphAdapter on an Azure Linux ML Compute, a 'database is locked' error occurs after running for about a minute.

Stack Traces

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Cell In[9], line 4
      1 from hamilton.plugins import h_diskcache
      2 from hamilton.driver import Builder
----> 4 cache_adapter = h_diskcache.DiskCacheAdapter()
      6 builder = (Builder()
      7            .with_adapters(cache_adapter)
      8            )

File /anaconda/envs/pdf-env/lib/python3.10/site-packages/hamilton/plugins/h_diskcache.py:84, in DiskCacheAdapter.__init__(self, cache_vars, cache_path, **cache_settings)
     82 self.cache_vars = cache_vars if cache_vars else []
     83 self.cache_path = cache_path
---> 84 self.cache = diskcache.Cache(directory=cache_path, **cache_settings)
     85 self.nodes_history: Dict[str, List[str]] = self.cache.get(
     86     key=DiskCacheAdapter.nodes_history_key, default=dict()
     87 )  # type: ignore
     88 self.used_nodes_hash: Dict[str, str] = dict()

File /anaconda/envs/pdf-env/lib/python3.10/site-packages/diskcache/core.py:478, in Cache.__init__(self, directory, timeout, disk, **settings)
    476 for key, value in sorted(sets.items()):
    477     if key.startswith('sqlite_'):
--> 478         self.reset(key, value, update=False)
    480 sql(
    481     'CREATE TABLE IF NOT EXISTS Settings ('
    482     ' key TEXT NOT NULL UNIQUE,'
    483     ' value)'
    484 )
    486 # Setup Disk object (must happen after settings initialized).

File /anaconda/envs/pdf-env/lib/python3.10/site-packages/diskcache/core.py:2438, in Cache.reset(self, key, value, update)
   2436         update = True
   2437     if update:
-> 2438         sql('PRAGMA %s = %s' % (pragma, value)).fetchall()
   2439     break
   2440 except sqlite3.OperationalError as exc:

OperationalError: database is locked

Steps to replicate behavior

  1. Set up an Azure Linux ML Compute environment
  2. Open a VSCode notebook
  3. Run the following code:

    from hamilton.plugins import h_diskcache
    from hamilton.driver import Builder
    
    cache_adapter = h_diskcache.DiskCacheAdapter()
    
    builder = (Builder()
              .with_adapters(cache_adapter)
             )
  4. Wait for about a minute
  5. Observe the 'database is locked' error

Library & System Information

Expected behavior

The CachingGraphAdapter should initialize without a 'database is locked' error.

Additional context

After the error occurs, the cache.db file cannot be deleted until the kernel is restarted. This suggests that the file remains locked even after the error occurs.

skrawcz commented 1 month ago

@giladrubin1 I think this might be an azure issue - https://github.com/microsoft/autogen/issues/79

Can you try specifying a different location for the cache? or trying https://github.com/microsoft/autogen/issues/79#issuecomment-1744086814 if that makes sense?

giladrubin1 commented 1 month ago

Yes, - it seems like it's an sqlite issue on AzureML Compute. I'll change the directory of the cache to be outside of the compute filesystem, hopefully it'll work. Thanks!