RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.25k stars 1.25k forks source link

Disable solver-specific weekly coverage jobs #21235

Closed jwnimmer-tri closed 4 months ago

jwnimmer-tri commented 5 months ago

The solver-specific weekly coverage jobs are practically useless, and spuriously fail too often. We should just remove them.

To remove:

I don't care one way or the other whether we keep "experimental" versions around.

Relatedly, per #20500 we plan to eventually remove all of the weekly jobs.

jwnimmer-tri commented 5 months ago

FYI @mwoehlke-kitware @BetsyMcPhail this ticket might be another nice test-case for the updated "show the list of job changes" summary stuff.

mwoehlke-kitware commented 5 months ago

this ticket might be another nice test-case for the updated "show the list of job changes" summary stuff.

Yup; I think this is the same one Betsy mentioned internally already. I really want to land the second round of process overhaul first, though, as it makes the process a lot less janky.

mwoehlke-kitware commented 4 months ago

@jwnimmer-tri, these weekly Linux jobs will still exist; is this correct?

...or do we want to drop some of those also?

jwnimmer-tri commented 4 months ago

Those are still correct (should remain intact).

mwoehlke-kitware commented 4 months ago

Thanks!

Related, even post-#20500, do we want to drop everything-coverage? It looks like we'll have no proprietary solver coverage at all if we do. Maybe we keep that as the only weekly job?

jwnimmer-tri commented 4 months ago

I haven't fully planned it out yet, but my best guess right now is that the only production coverage job we would run would be linux-jammy-gcc-bazel-nightly-everything-coverage. A single report per night seems like it should be plenty.

mwoehlke-kitware commented 4 months ago

the only production coverage job we would run would be linux-jammy-gcc-bazel-nightly-everything-coverage

...which would be promoting that from weekly to nightly. Is it fast enough for nightly? (Especially after we increase the number of tests?) Right now AFAICT the only nightly coverage job we have is with no proprietary solvers.

Perhaps we're just willing to pay for it being slow given it will be one job that gives results for all (tested) combinations of solvers?

mwoehlke-kitware commented 4 months ago

@jwnimmer-tri, FYI https://github.com/RobotLocomotion/drake-jenkins-jobs/pull/134#issuecomment-2077746899 if you want to check that the proposed changes look reasonable.

jwnimmer-tri commented 4 months ago

We're running the non-everything every night already; it's pretty slow, and the everything is ~30% slower, so your doubts are valid but I'm approaching it from the top-down side instead of bottom-up.

The only useful thing we want for our developer team is a relatively up-to-date, unified coverage report -- just 1 page. That means running just 1 job, and running it every night -- weekly is too stale. Since we don't want to leave behind commercial solvers, it needs to be "everything" instead of not. That's why my hypothesis of running just that one job.

If that job ends up being unreasonably slow, then we get to figure out how to make it faster. Either with a bigger machine, or skipping over some of the acceptance-test cases that are time-consuming yet not very relevant for coverage reports.

BetsyMcPhail commented 4 months ago

We may want to schedule the nightly everything coverage job to start earlier in the evening to make sure it finishes at a reasonable time. Currently, packaging jobs start at 1am and all other nightly jobs start somewhere between 2am and 4:59 am.

jwnimmer-tri commented 4 months ago

Sure, we can revisit that when I take up #20500. For now, I don't think its urgent re: when it finishes.