Make cache management more dynamic

dbutenhof commented 1 year ago

PBENCH-1301

This could be refined, but I want to get it up for review. I'd love to get it on the staging server for testing with more limited disk space, although I can ask for 95% or 880Gib on my laptop and see it free cache. (I'm out all next week, but I may spend a bit of time trying to clean this up: e.g., trying to add some unit tests.)

First, this adds code to track the unpacked size of tarballs to help with managing cache goals. This is based on a du -s -B1 of the unpacked directory tree, and stored as metadata.

When asked to unpack, it will check whether there's sufficient space on the cache device, and if not it will reclaim cache with a goal sufficient to accommodate the target dataset.

The pbench-tree-manage command can now target either % free or bytes free, and the background timer job will attempt to free 20% of the drive every 4 hours instead of targeting "old" tarballs. (The last_ref timestamp is now used only to sort the list of datasets with live cache on input to reclaim so that we free the oldest first.)

dbutenhof commented 11 months ago

podman image ls --filter 'reference=pbench-agent' --filter 'reference=pbench-server' --format '{{.Id}}' --filter containers=false

sort -u

xargs podman image rm -f Error: image name or ID must be specified

true

webbnh commented 11 months ago

podman image ls --filter 'reference=pbench-agent' --filter 'reference=pbench-server' --format '{{.Id}}' --filter containers=false

sort -u

xargs podman image rm -f Error: image name or ID must be specified

true

That's old news, now. I resubmitted your build and it failed unit tests.

dbutenhof commented 11 months ago

Actually, it did both. I'm not sure why the cleanup is generating a blank argument to podman image rm -f, but it apparently is. But, yeah: the CI appears to be hitting some weird unit test problem I'm not seeing locally, and I really hate when that happens.

dbutenhof commented 11 months ago

Ah. It's more of that "asynchronous pytest run with implicit dependencies". Dang. So it is a real problem. 😦

webbnh commented 11 months ago

I'm not sure why the cleanup is generating a blank argument to podman image rm -f, but it apparently is.

Apparently the filtered output of podman image ls is empty, and so there is nothing for the image rm to remove. I'm not sure what might have caused that, but the build is supposed to ignore this result, so I think the fact that we're seeing failure notifications is unrelated to this (i.e., I don't think that this is a problem).

the CI appears to be hitting some weird unit test problem I'm not seeing locally, and I really hate when that happens.

😞

webbnh commented 11 months ago

While you're thinking about this stuff...did you see my comment from the other week?

dbutenhof commented 11 months ago

While you're thinking about this stuff...did you see my comment from the other week?

Yeah, I saw that. I was tempted to just pull it into this PR although it deserves its own and I'm not particularly inclined to deal with that now. Then again, maybe I should just throw it into the pot ...

webbnh commented 11 months ago

I saw that.

OK, good.

I was tempted to just pull it into this PR although it deserves its own and I'm not particularly inclined to deal with that now. Then again, maybe I should just throw it into the pot ...

No, no...you are right to resist that temptation. (I just didn't know how closely related or how big a change it would require.)

dbutenhof commented 11 months ago

I saw that.

OK, good.

I was tempted to just pull it into this PR although it deserves its own and I'm not particularly inclined to deal with that now. Then again, maybe I should just throw it into the pot ...

No, no...you are right to resist that temptation. (I just didn't know how closely related or how big a change it would require.)

It's a trivial one-character change, I think, since I believe that the plural form they want us to use takes *args. That is, just changing add_column to add_columns should do it.

distributed-system-analysis / pbench

Make cache management more dynamic #3578