NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.67k stars 1.51k forks source link

idea: nix-collect-garbage option to also respect creation date of a derivation #2793

Open takeda opened 5 years ago

takeda commented 5 years ago

Currently nix-collect-garbage has an option to not delete older entries than given amount of time. It looks like that setting only applies to generations. If someone uses nix for development and rarely uses nix-env calling garbage collection will still wipe most of derivations. If there was an option to also check time when given store path was created and not delete anything recent it would reduce amount of packages that need to be fetched again and it would also reduce amount of data fetched from the caching server.

matthewbauer commented 5 years ago

I don't think Nix keeps track of creation date for derivations. ctime should always be 0. Perhaps with something like FUSE we could keep track of that or even things like last access time: https://groups.google.com/forum/#!searchin/nix-devel/nix-collect-garbage%7Csort:date/nix-devel/ej-21kuvCyw/0huYqAosBQAJ

takeda commented 5 years ago

There is an SQLite database, maybe that keeps track of this?

edolstra commented 5 years ago

Yes, the SQLite database keeps track of the "registration time", i.e. the time the path was added to the DB:

# sqlite3 /nix/var/nix/db/db.sqlite 'select path, registrationTime from ValidPaths order by registrationTime limit 10'
/nix/store/vn6fkjnfps37wa82ri4mwszwvnnan6sk-glibc-2.25|1490003668
/nix/store/gij6mgj1vixf7qcyb13h5aa5y15r2xxd-attr-2.4.47|1490003668
/nix/store/v0wcqsb6vpljx13vw8q60dvldf5pffma-acl-2.2.52|1490003669
...

So the garbage collector could use this. However it's not clear whether this is a good idea because an old path may still be recently used.

(Once upon a time the garbage collector used atime, but most systems don't maintain atime anymore.)

takeda commented 5 years ago

Well it will still be an improvement over current behavior which purges these paths anyway. Note I'm not asking to replace current generation check, just add extra requirement before removing (perhaps an option to enable/disable it if this ruins someone's workflow).

The atime also would be nice to use if available (I don't have way to check right now, but what does system report as atime when it's not available, is it 0, current time, or some other value?) with 0 it would just work without extra code, otherwise we would ignore it.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

Atemu commented 3 years ago

Still important to me. Even if slightly flawed, discarding only what's been added >n days ago would still be very helpful and a nice medium between --gc and --delete.

Given that the world is rebuilt every month or so anyways, the chance of discarding recently used output paths is rather slim even with this flawed method.

Maybe gc could also differentiate between fixed-output (, CA) and regular drvs.

Another useful proprerty to filter for when gc'ing would be drvs that are in the build-time closure of drvs that are supposed to be kept (gcroot'd or added too recently). This could be incredibly useful for robotnix users who have to manage gcroots for build envs if they want the dozens of GiB large android sources to persist a gc but also some use-cases in NixOS.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

risicle commented 1 year ago

This may risk going offtopic, but I'd quite like an access-time-basis for prioritizing deletions. Doing quite a lot of nix development, many things I build may not currently be in a gcroot, but it's likely that a new build will want to reference a recently accessed package again.

Atemu commented 1 year ago

@risicle atimes aren't recorded since the Nix store you access as a user is readonly to you.

risicle commented 1 year ago

Mmmmm atime is certainly updated for me.

$ stat /nix/store/zxmp0hm86g25inbllb8610c9mwxglik8-libelf-0.8.13
  File: /nix/store/zxmp0hm86g25inbllb8610c9mwxglik8-libelf-0.8.13
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fe05h/65029d    Inode: 1584906     Links: 5
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-10-09 12:09:44.741765352 +0100
Modify: 1970-01-01 01:00:01.000000000 +0100
Change: 2022-10-09 12:09:44.741765352 +0100
 Birth: 2022-10-09 12:09:44.729765022 +0100
$ ls /nix/store/zxmp0hm86g25inbllb8610c9mwxglik8-libelf-0.8.13
include  lib  share
$ stat /nix/store/zxmp0hm86g25inbllb8610c9mwxglik8-libelf-0.8.13
  File: /nix/store/zxmp0hm86g25inbllb8610c9mwxglik8-libelf-0.8.13
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fe05h/65029d    Inode: 1584906     Links: 5
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-11-06 14:09:51.755988282 +0000
Modify: 1970-01-01 01:00:01.000000000 +0100
Change: 2022-10-09 12:09:44.741765352 +0100
 Birth: 2022-10-09 12:09:44.729765022 +0100
Atemu commented 1 year ago

Ah, might be because of relatime. It only updates atime if mtime has been modified since the last access which it obviously wouldn't have. Forgot it had that property too in addition to the once-per-day update limit.

risicle commented 1 year ago

An advantage I see of atime is it'll get updated even if your machine is sharing packages over e.g. nix serve and a package is requested.

risicle commented 1 year ago

I've been playing with some ideas around this over at https://github.com/risicle/nix-heuristic-gc

klarkc commented 10 months ago

+1 for nix-serve and nix-serve-ng, doing remote copies through nix copy also is a problem, with nix min-free and max-free, it automatically gc derivations that has been just copied to the store, which makes no-sense.

hacklschorsch commented 6 months ago

Ah, might be because of relatime. It only updates atime if mtime has been modified since the last access which it obviously wouldn't have. Forgot it had that property too in addition to the once-per-day update limit.

This is boolean "or", not boolean "and":

relatime maintains atime data, but not for each time that a file is accessed. With this option enabled, atime data is written to the disk only if the file has been modified since the atime data was last updated (mtime), or if the file was last accessed more than a certain amount of time ago (by default, one day). (via)

So relatime (which is the default on many systems IIRC?) should be well-suited for gc removal?

fzakaria commented 5 months ago

I was wondering why things were being deleted!