dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
8.35k stars 425 forks source link

[Bug]: Disk I/O Error when using tools due to shared outlines cache database #827

Open AaronFriel opened 5 months ago

AaronFriel commented 5 months ago

This was discovered when using the Llama 3 Instruct model with the workaround in #4180 on multiple nodes using a shared filed system for the cache directory.

Requests to the vLLM with tool_calls using the OpenAI compatible tool call API utilize the outlines library. When the filesystem is shared between nodes, e.g.: using AWS Elastic File System, the outlines library opens a SQLite database in the shared cache dir.

This causes I/O errors on at least one node due to each node conflicting on writing to the same SQLite database.

To mitigate this, Outlines should likely not default to ~/.cache - which is unfortunately often expected to be shared between nodes to share model weights - but more likely should use /tmp. This should also ensure that caches cannot be poisoned by invalid values and can be cleared on a restart of a container.

While this can be configured by an environment variable, I was surprised to see a non-multi-user safe file being opened in ~/.cache.

https://github.com/outlines-dev/outlines/blob/main/outlines/caching.py#L14-L29

islam-nassar commented 3 months ago

hi @AaronFriel , what is the environment variable to set to change this behaviour. I am getting the same error.

Edit: found the solution in your original issue