JuliaCI / julia-buildkite-plugin

Buildkite plugin to install Julia for use in a pipeline.
4 stars 5 forks source link

Artifacts aren't always cleared #53

Open maleadt opened 1 week ago

maleadt commented 1 week ago

As observed on amdci7, Yggdrasil's build environment, which runs with these pretty vanilla settings: https://github.com/JuliaPackaging/Yggdrasil/blob/6e50b87e3c0e0d289dba12898ee4ceb5b369c446/.buildkite/utils.jl#L33-L37. The depots there get really large, over 500GiB (prompting https://github.com/JuliaCI/julia-buildkite-plugin/pull/52) all in artifacts. However, many of these artifacts are pretty old, and I'd be surprised if they are all used:

amdci7% du -hs e2fd9734-29d8-45cd-b0eb-59f7104f3131
421G    e2fd9734-29d8-45cd-b0eb-59f7104f3131

There is an artifact_usage.log in there:

amdci7% cat logs/artifact_usage.toml
[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/LLVM_full_jll/H2cfL/Artifacts.toml"]]
time = 2024-11-07T16:08:48.996Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/LibCURL_jll/lvgm5/Artifacts.toml"]]
time = 2024-11-07T16:08:48.998Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/LibSSH2_jll/05BCE/Artifacts.toml"]]
time = 2024-11-07T16:08:18.144Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/MozillaCACerts_jll/dt3sW/Artifacts.toml"]]
time = 2024-11-07T16:08:18.145Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/OpenSSL_jll/IOE5P/Artifacts.toml"]]
time = 2024-11-07T16:08:18.093Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/Zlib_jll/0qQy0/Artifacts.toml"]]
time = 2024-11-07T16:08:48.999Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/nghttp2_jll/NGQzG/Artifacts.toml"]]
time = 2024-11-07T16:08:48.997Z

[["/cache/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/packages/p7zip_jll/wZ0qA/Artifacts.toml"]]
time = 2024-11-07T16:08:18.145Z

... yet Yggdrasil's CI reveals: https://buildkite.com/julialang/yggdrasil/builds/14525#019305f3-fd9e-4b5e-aa12-11c8ce13e03e

┌ Info: Running Pkg.gc()
└   collect_delay = 604800 seconds
      Active manifest files: 0 found
      Active artifact files: 0 found
      Active scratchspaces: 0 found
     Deleted no artifacts, repos, packages or scratchspaces

Could this be because the manifest_usage.log is pointing to nonexisting resources, because those are ephemeral?

amdci7% cat /julia/agent-cache/yggy-amdci7.0/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs/manifest_usage.toml
[["/cache/build/yggy-amdci7-0/julialang/yggdrasil/.ci/Manifest.toml"]]
time = 2024-11-07T16:08:59.738Z

amdci7% ls /julia/agent-cache/yggy-amdci7.0/build/julialang/yggdrasil/.ci/Manifest.toml
No such file or directory

cc @KristofferC

DilumAluthge commented 1 week ago

Are you able to pop into the machine and run commands interactively? If so, could you run https://github.com/giordano/PkgCleanup.jl to figure out which manifests and artifacts files PkgCleanup thinks are still "active" or "live"?

maleadt commented 1 week ago

to figure out which manifests and artifacts files PkgCleanup thinks are still "active" or "live"

It doesn't explain much, because as I mentioned above there's no real Manifest out there. The only one is an ephemeral one that got cleaned during or after the run. I guess that defeats the Pkg.gc mechanism?

amdci7% JULIA_DEPOT_PATH=$(pwd) julia +1.7

shell> du -hs
421G    .

julia> using Pkg

julia> Pkg.gc(;verbose=true)
      Active manifest files: 0 found
      Active artifact files: 0 found
      Active scratchspaces: 0 found
     Deleted no artifacts, repos, packages or scratchspaces

julia> using PkgCleanup

julia> PkgCleanup.manifests()
Select the Manifest.toml to keep in /julia/agent-cache/test-depot/logs/manifest_usage.toml
[press: d=done, a=all, n=none]
 > [X] /julia/agent-cache/test-depot/environments/v1.7/Manifest.toml

I guess we'll have to roll our own mechanism that only considers the artifact folder's mtime, just like we do for the compilecache. I thought that the "orphanage" feature of Pkg.gc would handle this, but apparently it doesn't. I'm surprised, in fact, that many of the depots don't even have an orphaned.toml...

amdci7% ls yggy-amdci7.*/julia-buildkite-plugin/depots/*/logs
yggy-amdci7.0/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.10/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.11/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.12/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.13/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.14/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.15/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.1/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  orphaned.toml  scratch_usage.toml

yggy-amdci7.2/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  orphaned.toml  scratch_usage.toml

yggy-amdci7.3/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.4/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.5/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.6/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  orphaned.toml  scratch_usage.toml

yggy-amdci7.7/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.8/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  scratch_usage.toml

yggy-amdci7.9/julia-buildkite-plugin/depots/e2fd9734-29d8-45cd-b0eb-59f7104f3131/logs:
artifact_usage.toml  manifest_usage.toml  orphaned.toml  scratch_usage.toml

We should always run Pkg.gc during clean-up, which AFAIU is when we create that file? The ones that exist do have realistic contents.

DilumAluthge commented 1 week ago

Hmmm, maybe I'm misunderstanding, but if the manifest file is deleted at the end of the CI job, then Pkg.gc() should have no problem cleaning up the artifacts after the desired period of time.

maleadt commented 1 week ago

if the manifest file is deleted at the end of the CI job, then Pkg.gc() should have no problem cleaning up the artifacts after the desired period of time

Hmm, that seems correct:

amdci7% du -hs artifacts
421G    artifacts

amdci7% JULIA_DEPOT_PATH=$(pwd) julia +1.7

julia> using Pkg, Dates

# 1 week
julia> Pkg.gc(collect_delay=Second(604800), verbose=true)
      Active manifest files: 1 found
        `/julia/agent-cache/test-depot/environments/v1.7/Manifest.toml`
      Active artifact files: 0 found
      Active scratchspaces: 0 found
     Deleted no artifacts, repos, packages or scratchspaces

julia> Pkg.gc(collect_delay=Second(1), verbose=true)
      Active manifest files: 1 found
        `/julia/agent-cache/test-depot/environments/v1.7/Manifest.toml`
      Active artifact files: 0 found
      Active scratchspaces: 0 found
     Deleted `/julia/agent-cache/test-depot/packages/ASL_jll/FCoFD` (16.311 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/ArgParse/mpp98` (228.894 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/Arpack_jll/D9OBi` (51.984 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/AssetRegistry/4TyKv` (9.519 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/Attr_jll/ZSRCU` (12.395 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/AutoHashEquals/tDuUH` (8.021 KiB)
     Deleted `/julia/agent-cache/test-depot/packages/BinaryBuilder/HTdhp` (494.148 KiB)
...

However, I take it that works based off of the orphaned.toml contents? If so, it's weird that many depots don't have that, even though we (should) unconditionally run Pkg.gc at the end of every plugin run.