Closed dave-2 closed 5 years ago
In all of these, the tests failed to run with something like:
system_health.common_mobile is disabled on the selected browser This platform is not supported for this benchmark. If this is in error please add it to the benchmark's supported platforms.
and the output was an empty dict
These bots appear to be fine though, and other runs sometimes succeed so this is really odd.
@ehanley324
Yes in this CL: https://chromium-review.googlesource.com/c/catapult/+/1012124 we added generating an empty dict so the results wouldn't be null when the benchmark was disabled.
So there might be 2 issues here. First is when the benchmark is disabled, shouldn't it return [] if we want to return a technically valid histogram set? @eakuefner
2nd, these bots appear to be fine and some have tests that run fine on other devices so how could the benchmark not be supported?
Should we add --also-run-disabled-tests
to the Pinpoint Telemetry arguments?
running disabled tests could be misleading since if they are disabled for a reason then they might give false negatives for the result. If you are only running a single story then I think it is ok to include that flag.
Ethan, I thought we decided that {} was the valid empty histogram? I feel like we would be seeing more issues with this if it wasn't, but the perf results are a list of dicts, should we be outputting [] instead?
Recipe runs with disabled tests, tests could be disabled in a range that we want to revisit with a patch or they just weren't re-enabled quickly after they were last fixed.
@eakuefner can you chime in on my original question, whether [] is the appropriate response?
Also note that there appears to be a second problem here, in that some bots ran fine while others complained about the test being disabled on that browser.
Originally, @ehanley324 was just trying to return something that was valid JSON, not necessarily a valid HistogramSet. Let's change it to [] since that's a valid HistogramSet.
Any ideas on the 2nd issue, the fact that some runs on some bots complain about the test being disabled on that platform? To me that's the more worrying issue.
@nedn for the issue @simonhatch mentioned:
In job 179a3ba5240000, we have two runs with the exact same device and exact same inputs, but
Also happened in job 12eca5bd240000.
This does seem like a bug in telemetry platform detection. Either that or the device is not giving the right dimensions every time. Maybe Ned can give more insight into that code.
You should not though that the --also-run-disabled-tests will not work to force run in this case because it is failing because it can't find the platform: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?l=337
I suspect this is a problem with flaky device connection. The log in the run which says benchmark is disabled doesn't show that devices were connected properly
@perezju
I saw another Job where the story was disabled. The linked bug says that it fails in the perf waterfall but works fine in Pinpoint. So I think I will still add --also-run-disabled-tests
for cases like these.
Also filed #4527 for a more intuitive error message. It should tell the user that nothing ran, rather than there were no test results.
With Emily's change c95acfe and the flag change 399c9ee, I would call this done.
We have #4527 for a more intuitive UI in this scenario, and we can file a separate issue for the flaky device detection if we still see it in the future.
@eakuefner
https://pinpoint-dot-chromeperf.appspot.com/job/16e59063240000 https://pinpoint-dot-chromeperf.appspot.com/job/167787e5240000 https://pinpoint-dot-chromeperf.appspot.com/job/179a3ba5240000