Open jwnimmer-tri opened 1 year ago
Assigning Zach for triage / delegation per component lead chart
Since we're not sure if this is a broken CI machine or not, I'll change the assignment to @svenevs for the moment, and move this to the project board. I don't think it'll be urgent.
f2f note: when trying to bisect vtk:
ccache
repository.bzl
to just glob for headers to that you enumerate (for the bisect, so that you don't have to keep changing them)while debugging the tests, dump images to disk, example:
bindings/pydrake/visualization/test/video_test.py: filename = os.environ["TEST_UNDECLARED_OUTPUTS_DIR"] + "/color.gif"
I should clarify the goal here.
The first call to action is to characterize what's happening. Are the test failures minor (akin to a tolerancing issue), or do they indicate major problems? In case the errors in are image-comparison test cases, we could dump the images to disk and visually compare them to try to get a sense of what's going wrong. Are the test failures reproducible locally vs only in CI? Are the failures dependent on the order that test cases are run in, or which cases are enabled/disabled?
Once we have that kind of information, we can decide how much more effort to invest in trying to fix it.
FYI I'm elevating the priority of this, since we're going to lose our x86 macOS CI coverage soon.
This may very well be the root cause of the problem. If you deploy a VM and ScreenShare in, the tests all pass, same as being on a non-virtualized m1.
$ man arch
...
The arch_name argument must be one of the currently supported
architectures:
i386 32-bit intel
x86_64 64-bit intel
x86_64h 64-bit intel (haswell)
arm64 64-bit arm
arm64e 64-bit arm (Apple Silicon)
When nightlies are done / I can run tests via jenkins again, plan is to try arm64e
. This one is certainly very peculiar. It does seem like the orka3 update improved the results on #20522, 2/2 tests fail on the same location. Will push something up to see if it repeats, but we may want to consider restoring an intermediate solution (only filter out the known-failing tests).
Some additional findings, the test appears to be too resource heavy somehow. I thought that was because of the image saving tests from #20470, but as it turns out, it happens even if you are only running internal_render_engine_vtk_test
(https://github.com/RobotLocomotion/drake/pull/20470#pullrequestreview-1950245426).
It seems like maybe there are some JVM arguments that could be getting added to the orka agent on the Jenkins cloud settings, but it is not clear to me at this juncture what those arguments may or may not be.
See also: https://github.com/RobotLocomotion/drake-ci/pull/269#issuecomment-2010575474
There's some weird setup as to where things are actually building on the macs. It could possibly be related to that, but it seems unlikely. On the Ubuntu side, we have the init_script
create the filesystem for /tmp
on the newly attached EBS volume when the instance launches with jvm options -Djava.io.tmpdir=/media/ephemeral0/tmp
. On macOS, we just use everything directly (since the storage is already there).
But things that are worth trying would be to change the heap size, or possibly some other java flags related to memory and/or cpu usage (?). This is hard to diagnose :disappointed:
Ok, I'm starting to run out of ideas. No discernable java flags made an impact. The only other thought I had was that we could try splitting internal_render_engine_vtk_test
into multiple different test files. I gave an initial (dirty and quick...) attempt at that, but splitting it into a library isn't valid, none of the tests actually run (ASSERT_EQ(true, false)
for example...).
https://github.com/RobotLocomotion/drake/commit/841e3a836523b78b1c977b062636482bd766bc6a
Is there a way to do that in a straightforward way? Take the class definitions and put them in a header file, and just have multiple different test files #include
it? (rather than link against a test library trying to do the same thing) I'm not particularly a gflags / gtest expert, but the idea was basically: if all the tests being in one file is too resource intensive, splitting it into multiple smaller tests may be successful.
Unfortunately, I'm not quite sure what else to try :cry:
Part of #18327. For prior art see #17566.
We're not sure if this is our buggy code, or buggy graphics driver stack, or buggy CI hardware, or etc.
For now, I'm going to nerf the test to get CI passing.