aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
5.23k stars 322 forks source link

IO Error: 'too many open files' when removing many corrupted runs #3224

Open Engrammae opened 2 months ago

Engrammae commented 2 months ago

🐛 Bug: Removal of many corrupted runs in one go

I ran a larger experiment tracking a lot of runs and apparently I had quite a few corrupted runs (in my case 539).

I tried removing them by calling aim runs rm --corrupted, but got an error "IO too many open files". I still could remove single corrupted runs with aim runs rm ${hash}. I tried increasing the limit with ulimit -n up to 2048, but too no effect

To reproduce

Somehow get a lot of corrupted runs and try to remove them at once with aim runs rm --corrupted

Expected behavior

A removal of runs that respects the limit of open files, so that aim runs rm --corrupted also works, if there are many corrupted runs.

Environment

Additional context

As a workaround I wrote a short bash-script to remove corrupted runs one by one, but this still quite cumbersome.

#! /bin/bash

aim runs ls --corrupted | head -n 1  | sed 's/\t/\n/g' > corrupted_runs

while  read -r run;
do
    echo "Removing corrupted run: ${run}"
    aim runs rm ${run} -y
done <  corrupted_runs