This was discovered when using the Llama 3 Instruct model with the workaround in #4180 on multiple nodes using a shared filed system for the cache directory.
Requests to the vLLM with tool_calls using the OpenAI compatible tool call API utilize the outlines library. When the filesystem is shared between nodes, e.g.: using AWS Elastic File System, the outlines library opens a SQLite database in the shared cache dir.
This causes I/O errors on at least one node due to each node conflicting on writing to the same SQLite database.
To mitigate this, Outlines should likely not default to ~/.cache - which is unfortunately often expected to be shared between nodes to share model weights - but more likely should use /tmp. This should also ensure that caches cannot be poisoned by invalid values and can be cleared on a restart of a container.
While this can be configured by an environment variable, I was surprised to see a non-multi-user safe file being opened in ~/.cache.
This was discovered when using the Llama 3 Instruct model with the workaround in #4180 on multiple nodes using a shared filed system for the cache directory.
Requests to the vLLM with
tool_calls
using the OpenAI compatible tool call API utilize theoutlines
library. When the filesystem is shared between nodes, e.g.: using AWS Elastic File System, the outlines library opens a SQLite database in the shared cache dir.This causes I/O errors on at least one node due to each node conflicting on writing to the same SQLite database.
To mitigate this, Outlines should likely not default to
~/.cache
- which is unfortunately often expected to be shared between nodes to share model weights - but more likely should use/tmp
. This should also ensure that caches cannot be poisoned by invalid values and can be cleared on a restart of a container.While this can be configured by an environment variable, I was surprised to see a non-multi-user safe file being opened in
~/.cache
.https://github.com/outlines-dev/outlines/blob/main/outlines/caching.py#L14-L29