catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
https://chromium.googlesource.com/catapult
BSD 3-Clause "New" or "Revised" License
1.91k stars 563 forks source link

[📍] Error in histogram_set.ImportDicts #4514

Closed dave-2 closed 5 years ago

dave-2 commented 6 years ago

@eakuefner

https://pinpoint-dot-chromeperf.appspot.com/job/16e59063240000 https://pinpoint-dot-chromeperf.appspot.com/job/167787e5240000 https://pinpoint-dot-chromeperf.appspot.com/job/179a3ba5240000

Traceback (most recent call last):
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/job.py", line 278, in Run
    work_left = self.state.ScheduleWork()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/job_state.py", line 151, in ScheduleWork
    attempt.ScheduleWork()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/attempt.py", line 77, in ScheduleWork
    self._Poll()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/attempt.py", line 81, in _Poll
    self._last_execution.Poll()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/quest/execution.py", line 96, in Poll
    self._Poll()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/dashboard/pinpoint/models/quest/read_value.py", line 105, in _Poll
    histograms.ImportDicts(histogram_dicts)
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-ce5742e9.410367997486382453/tracing/value/histogram_set.py", line 94, in ImportDicts
    if d.get('type') in all_diagnostics.GetDiagnosticTypenames():
AttributeError: 'unicode' object has no attribute 'get'
dave-2 commented 6 years ago

https://pinpoint-dot-chromeperf.appspot.com/job/12eca5bd240000 https://pinpoint-dot-chromeperf.appspot.com/job/158d368b240000 https://pinpoint-dot-chromeperf.appspot.com/job/138f2213240000

simonhatch commented 6 years ago

In all of these, the tests failed to run with something like:

system_health.common_mobile is disabled on the selected browser This platform is not supported for this benchmark. If this is in error please add it to the benchmark's supported platforms.

and the output was an empty dict

These bots appear to be fine though, and other runs sometimes succeed so this is really odd.

simonhatch commented 6 years ago

@ehanley324

ehanley324 commented 6 years ago

Yes in this CL: https://chromium-review.googlesource.com/c/catapult/+/1012124 we added generating an empty dict so the results wouldn't be null when the benchmark was disabled.

simonhatch commented 6 years ago

So there might be 2 issues here. First is when the benchmark is disabled, shouldn't it return [] if we want to return a technically valid histogram set? @eakuefner

2nd, these bots appear to be fine and some have tests that run fine on other devices so how could the benchmark not be supported?

dave-2 commented 6 years ago

Should we add --also-run-disabled-tests to the Pinpoint Telemetry arguments?

ehanley324 commented 6 years ago

running disabled tests could be misleading since if they are disabled for a reason then they might give false negatives for the result. If you are only running a single story then I think it is ok to include that flag.

Ethan, I thought we decided that {} was the valid empty histogram? I feel like we would be seeing more issues with this if it wasn't, but the perf results are a list of dicts, should we be outputting [] instead?

simonhatch commented 6 years ago

Recipe runs with disabled tests, tests could be disabled in a range that we want to revisit with a patch or they just weren't re-enabled quickly after they were last fixed.

@eakuefner can you chime in on my original question, whether [] is the appropriate response?

Also note that there appears to be a second problem here, in that some bots ran fine while others complained about the test being disabled on that browser.

eakuefner commented 6 years ago

Originally, @ehanley324 was just trying to return something that was valid JSON, not necessarily a valid HistogramSet. Let's change it to [] since that's a valid HistogramSet.

simonhatch commented 6 years ago

Any ideas on the 2nd issue, the fact that some runs on some bots complain about the test being disabled on that platform? To me that's the more worrying issue.

dave-2 commented 6 years ago

@nedn for the issue @simonhatch mentioned:

In job 179a3ba5240000, we have two runs with the exact same device and exact same inputs, but

Also happened in job 12eca5bd240000.

ehanley324 commented 6 years ago

This does seem like a bug in telemetry platform detection. Either that or the device is not giving the right dimensions every time. Maybe Ned can give more insight into that code.

You should not though that the --also-run-disabled-tests will not work to force run in this case because it is failing because it can't find the platform: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?l=337

nedn commented 6 years ago

I suspect this is a problem with flaky device connection. The log in the run which says benchmark is disabled doesn't show that devices were connected properly

@perezju

dave-2 commented 5 years ago

I saw another Job where the story was disabled. The linked bug says that it fails in the perf waterfall but works fine in Pinpoint. So I think I will still add --also-run-disabled-tests for cases like these.

Also filed #4527 for a more intuitive error message. It should tell the user that nothing ran, rather than there were no test results.

dave-2 commented 5 years ago

With Emily's change c95acfe and the flag change 399c9ee, I would call this done.

We have #4527 for a more intuitive UI in this scenario, and we can file a separate issue for the flaky device detection if we still see it in the future.