cirruslabs / cirrus-ci-agent

Agent to execute Cirrus CI tasks
Mozilla Public License 2.0
13 stars 6 forks source link

Preserve mtime of files in caches #277

Closed awelzel closed 1 year ago

awelzel commented 1 year ago

Hey,

I'm not sure this is easy, I haven't looked at the implementation, but here it goes: We're using the cache instruction to cache a ccache folder and it would be great if mtime would be preserved across tasks as best as possible such that pruning works as expected.

ccache prunes its cache using LRU based on mtime:

The LRU cleanup makes use of the file modification time (mtime) of cache entries; ccache updates mtime of the cache entries read on a cache hit to mark them as "recently used".

https://github.com/ccache/ccache/blob/master/doc/MANUAL.adoc#cache-size-management

When looking at mtime's of our /tmp/ccache directory after a build, most cache files have an mtime close to the start of the job (or a second later) which must represent when they were extracted, rather than when they were actually created/referenced last. Few files are newer which were reused/referenced/created during a build:

cirrus-ci-task-5831192507842560:/tmp/ccache# stat -c %Y 0/*/* | sort -n | uniq -c
   4702 1677153062
   7248 1677153063
     30 1677153120
     25 1677153121
     25 1677153122
     39 1677153123
     48 1677153124
      1 1677153127
      2 1677153129
      5 1677153130
     12 1677153131
     16 1677153132

The main issue here is that pruning the cache becomes essentially arbitrary as it's based on timestamp when files got extracted from the cache archive.

I don't think preserving mtime would cause any harm. Though probably should cap mtime to the current time upon extraction just in case a clock got confused somewhere so that files don't look like they got modified in the future.