allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.43k stars 643 forks source link

How to clean-up local storage on agent hosts #1228

Open thaikoh opened 3 months ago

thaikoh commented 3 months ago

We have multiple agents running tasks from a server. On agent host in directory /home/user/.clearml/cache/storage_manager/datasets we have following files and directories:

➜  datasets ll
total 63M
-rw-r--r-- 1 root root 1.7M Jan 23 16:17 0657a6f81e67d7dec63928aeec239f91.state.json
-rw-r--r-- 1 root root  22M Jan 31 23:40 14ce4f985c9da0b144bc3c4ecf439dd9.state.json
-rw-r--r-- 1 root root 1.1M Jan 31 23:47 342f25d588034d5dfb0abb6d9d9f470a.state.json
-rw-r--r-- 1 root root 3.3M Jan 31 23:46 49ff4e76b0413dc8866ac0b8d652df09.state.json
-rw-r--r-- 1 root root 5.3M Nov 21 17:47 536335d05cd8190952c1c70739708be7.state.json
-rw-r--r-- 1 root root 1.4M Jan 31 23:46 b918955acbc5f2433a1b46a8af36d7cd.state.json
-rw-r--r-- 1 root root 5.2M Nov 22 13:57 c105ce6fef8921bcd84fbb4d04d7a65f.state.json
-rw-r--r-- 1 root root 1.6M Jan 31 23:46 d4673bcd031c059c3ae8412bb7b60bee.state.json
drwxr-xr-x 3 root root 4.0K Nov 22 13:57 ds_0f2a68f7cb3c498789764f4bedf310f8
drwxr-xr-x 3 root root 4.0K Jan 23 16:15 ds_39435110f9bd432ba0dc3e468598382c
drwxr-xr-x 3 root root 4.0K Jan 12 12:30 ds_4b8939e8dc654c7f8fdd9a1341a3d57a
drwxr-xr-x 3 root root 4.0K Jan 23 16:15 ds_68ebd46cbb2f4b73bf55704d23352c51
drwxr-xr-x 3 root root 4.0K Jan 31 23:46 ds_7e76a7fdb92544759970ac4d179cd2e9
drwxr-xr-x 3 root root 4.0K Jan 31 23:41 ds_82833df6861f4ab592ec9da0920c4ddc
drwxr-xr-x 3 root root 4.0K Nov 21 17:47 ds_9e8651212dbf4be088e540d1db3df092
drwxr-xr-x 3 root root 4.0K Jan 31 23:46 ds_a9517a01182b41818789363c02971c2d
drwxr-xr-x 3 root root 4.0K Jan 31 23:47 ds_ccf94b267bb24755b01e27fcb64e3889
drwxr-xr-x 3 root root 4.0K Jan 31 23:46 ds_eb29b16f746b4f5eba984a315976fc64
-rw-r--r-- 1 root root 1.2M Jan 23 16:17 e396d98f7fda58d0771851fa487708fb.state.json
-rw-r--r-- 1 root root  22M Jan 12 12:30 e5c7192b0eb6a9c4422a1e6c1d35686e.state.json

Total size is tens of gigabytes on each agent host, so the question is how to safe clean-up these files?

jkhenning commented 3 months ago

@thaikoh ClearML has a built-in cache total files setting which allows you to control the maximum number of files in the cache. If you'd like to clean up files there yourself, you can probably do that by access time