elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Group cache files by repository #332

Closed jonatanklosko closed 4 months ago

jonatanklosko commented 4 months ago

Currently our cache directory has flat structure, where for each cached file we have one .json file with metadata and another with file contents. The file names are URL hashes or etags, so they are not meaningful to the user.

The cache should be opaque to the user, however it's easy to accumulate a lot of models, but clearing the whole cache is not ideal, since the user may want to remove only the models they no longer plan to use. For this purpose, this PR groups all cache files related to a specific HF repository in a separate subdirectory with meaningful name.

As a result, the existing caches will be invalidated, but I think it's worth it and we can just mention in the changelog that users should clear the cache.