NixOS / GSoC

Creative Commons Zero v1.0 Universal
24 stars 7 forks source link

ideas/2024: propose an analytics project (time budgeted builds) #16

Closed SomeoneSerge closed 4 months ago

SomeoneSerge commented 4 months ago

I'd like to CC @Mic92, @RaitoBezarius, @GuillaumeDesforges, @GaetanLepage, @ConnorBaker, and @samuela for comments and as potential "potential mentors" (e.g. I've never looked into the implementations of nix-index, nix-eval-jobs, nix-fast-builds, etc so I may lack some of the expertise required for the project to succeed in time)

I didn't write this up but I think one of the prerequisites of a clean solution is the problem of identifying derivations from different nixpkgs instantiations (different revisions, different config arguments, etc), which by design "lack identity". What we can easily match is e.g. nixpkgs' attribute paths. However, derivations overridden/defined in e.g. let-in expressions will have non-trivial contributions to the total cost and we need to be able to identify these

samuela commented 4 months ago

I was once hired as an intern to do a project like this at one of the faanGs. My takeaway at the time was that we ought to have just built better static analysis/build infrastructure instead of trying to ML it.

That's not to say that this project is not worth pursuing... Experiments are worthwhile. And even if we got just a dashboard showing plots of build times (in CPU hours) for each package that would be a worthwhile success IMHO.

RaitoBezarius commented 4 months ago

Hmm, where is the machine learning component in the proposal here? Or do we consider basic statistical analysis to be an machine learning algorithm :P ?

RaitoBezarius commented 4 months ago

Either way, I think it's important to separate the:

parts of the project. Even building something that can collect the data and ship it somewhere else is already great and can be reused by other people to do other parts of this idea.