NixOS / infra

NixOS configurations for nixos.org and its servers
MIT License
208 stars 91 forks source link

Put hydra advances query result in a public s3 bucket somewhere on a cron job #405

Open lf- opened 3 months ago

lf- commented 3 months ago

@mweinelt :)

Is your feature request related to a problem? Please describe.

We have no good way of bisecting hydra anymore since the grahamc service died a year ago. But we have the data, it is just locked behind the hydra API being busted, and can relatively easily simply be obtained from the database. Although there is definitely merit in doing the https://pad.lassul.us/opendata-projects and making an open hydra read replica, this requires effort.

I have written a tool https://github.com/lf-/necktangler, and a ~5s sql query to run against hydra, where the result could simply be put in a s3 bucket on a nightly cron job and we could call it a day.

Describe the solution you'd like

Put this query in a cron job somewhere with hydra DB access, and run it with:

psql $SOME_CONNECTION_DETAILS -v job=tested -v project=nixos -v jobset=trunk-combined --csv -f hydra-query.sql > data-nixos-unstable.csv
psql $SOME_CONNECTION_DETAILS -v job=tested -v project=nixos -v jobset=unstable-small --csv -f hydra-query.sql > data-nixos-unstable-small.csv
psql $SOME_CONNECTION_DETAILS -v job=unstable -v project=nixpkgs -v jobset=trunk --csv -f hydra-query.sql > data-nixpkgs-unstable.csv
psql $SOME_CONNECTION_DETAILS -v job=darwin-tested -v project=nixpkgs -v jobset=nixpkgs-23.11-darwin --csv -f hydra-query.sql > data-nixpkgs-23.11-darwin.csv
psql $SOME_CONNECTION_DETAILS -v job=tested -v project=nixos -v jobset=release-23.11 --csv -f hydra-query.sql > data-nixos-23.11.csv
psql $SOME_CONNECTION_DETAILS -v job=tested -v project=nixos -v jobset=release-23.11-small --csv -f hydra-query.sql > data-nixos-23.11-small.csv
# s3 upload to some bucket somewhere
select
    jsei.revision as sha, b.timestamp as advance_ts, concat(j.project, ':', j.name, ':', b.job) as hydra_job, b.id as hydra_build_id
    from jobsets j
        inner join builds b on j.id = b.jobset_id
        inner join jobsetevalmembers jsem on jsem.build = b.id
        inner join jobsetevals jse on jsem.eval = jse.id
        inner join jobsetevalinputs jsei on jsei.eval = jse.id
    where
    b.job = :'job' and
    j.project = :'project' and
    j.name = :'jobset' and
    jsei.name = 'nixpkgs' and
    b.finished = 1 and
    -- arbitraily chosen start in time; there are some bogus commit IDs from
    -- 2014 or so, which we would rather not deal with
    b.timestamp > extract(epoch from '2018-01-01T00:00:00Z'::timestamptz)
    order by b.timestamp asc

(nixpkgs-unstable is 8571 rows, nixos-unstable is 3000 rows, from several years. imo we have no reason to shard this any time soon)

Describe alternatives you've considered

Going and building a public read replica of hydra (sounds like effort).

Being unable to bisect hydra.

Additional context

RaitoBezarius commented 3 months ago

An alternative could be to add this particular API endpoint in the Hydra code and deploy it fast.