[📍] Pinpoint jobs continually fail

catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..

https://chromium.googlesource.com/catapult

BSD 3-Clause "New" or "Revised" License

1.91k stars 563 forks source link

[📍] Pinpoint jobs continually fail #4557

Closed quisquous closed 5 years ago

quisquous commented 5 years ago

I've retried this job four times now.

Swarming job timed out: https://pinpoint-dot-chromeperf.appspot.com/job/149caf9ea40000 Swarming test error: https://pinpoint-dot-chromeperf.appspot.com/job/12eb58e5a40000 Swarming test error: https://pinpoint-dot-chromeperf.appspot.com/job/16ef9341a40000 Attribute error: https://pinpoint-dot-chromeperf.appspot.com/job/15d619a9a40000

simonhatch commented 5 years ago

@dave-2

dave-2 commented 5 years ago

It's a very long benchmark, with 426 stories in it.

In the first job: some runs hit the test run timeout at 4 hours. Some timed out in under an hour, which I think might be the I/O timeout, rather than the overall run timeout. I think the reason why they differ so much is that they differ wildly in which tests were skipped. I don't know why that is. @nedn

For most of the runs in the second and third jobs, it says the test failed (did not hit a timeout). Most of the runs failed in these two stories. The test output is so long that it's truncated by Swarming; that's why Pinpoint wasn't able to get the traceback from the log.

[  PASSED  ] 217 tests.
[  SKIPPED ] 207 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ]  rendering.mobile/famo_us_twitter_demo
[  FAILED  ]  rendering.mobile/paper_calculator_hit_test

The last job: it says the test is disabled on this platform. @nedn mentioned in another bug that this may be due to adb flakiness while performing platform detection. Issue #4527 is open to make Pinpoint more robust to this kind of flakiness and provide a more useful error message.

nedn commented 5 years ago

I would recommend avoid running the whole rendering.mobile benchmark.

@simonhatch, @benshayden can we add any dashboard warning when people bisect on the whole benchmark?

@sadrulhc fyi

ksagar commented 5 years ago

I'm running into this issue with system_health.memory_mobile as well.

https://pinpoint-dot-chromeperf.appspot.com/job/1044de71a40000 https://pinpoint-dot-chromeperf.appspot.com/job/11b48c59a40000 https://pinpoint-dot-chromeperf.appspot.com/job/17ca6969a40000 https://pinpoint-dot-chromeperf.appspot.com/job/1413abd5a40000

quisquous commented 5 years ago

It's really unfortunate that we have a benchmark that you're not supposed to run. I use bisect jobs to evaluate the performance impact of patches. Having it run on more pages seems like better coverage for potential impact, just like running on 10k sites in CT. It seems a bit odd that all of these pagesets have merged into one only for me to then have to manually shard them and then merge the results back by hand.

simonhatch commented 5 years ago

Kinda feel like we need a way to specify subsets of the benchmark, which stories you're interested in, especially as benchmarking team continues to collapse more and more benchmarks together, it's doubtful that someone running a try job really intended to run every story in existence. Maybe we can surface the tagmap and let you select tags or something to that effect?

@dave-2 @nedn

quisquous commented 5 years ago

There is --story-tag-filter, but in this case I really intended to run over everything, to get a larger sample of what this change looked like.

Is v8.browsing_mobile also something I should not run? I'm trying to get a trace so I can understand a regression, and it keeps timing out: https://pinpoint-dot-chromeperf.appspot.com/job/16e72975a40000 https://pinpoint-dot-chromeperf.appspot.com/job/119983fea40000

simonhatch commented 5 years ago

Hmm then I think we need to be able to shard the benchmark, which was suggested in #4499

It looks like just a couple pages failed, but the test actually did produce output, @dave-2 should we still be showing the results2 in those cases?

dave-2 commented 5 years ago

For system_health.memory_mobile and v8.browsing_mobile: looks like the failing stories are disabled on the perf waterfall. I added --also-run-disabled-tests in CL 1132692. Let me revert that change now.

I think that might be true for some of the test runs in 2nd and 3rd jobs of rendering.mobile in the first comment too, so maybe we can get some of those to pass as well.

Looks like Swarming logs are capped at 16 MB. (chromium:865069) @nedn is there a way we can reduce the size of the logs? It's really difficult to diagnose any issues since the Swarming UI freezes while loading long logs, and Chrome freezes while trying to search them.

dave-2 commented 5 years ago

@simonhatch Let's focus on getting the test passing. It's possible to show results2 for tests even if they've failed, but Pinpoint doesn't currently have a mechanism to say "this thing failed, but we should continue analyzing it anyway." Failure causes the entire Attempt to stop.

nedn commented 5 years ago

Ah, I thought people was trying to bisect the whole benchmark, so that's why I suggested "void running the whole rendering.mobile benchmark."

The case of using tryjob, I think we should support sharding so people can run any arbitrarily large benchmarks.

dave-2 commented 5 years ago

Revert is deployed, so all new jobs will no longer have --also-run-disabled-tests. Please try the jobs again!

dave-2 commented 5 years ago

Closing for now, please re-open if the jobs continue to fail!

ksagar commented 5 years ago

I'm running into an issue with tryjobs for rendering.mobile again. I'm not sure if its the same timeout issue, the failed task was running for 4h 33m 55s, which is under the 6h timeout from the patch above. Here is the stack from the failed job:

Traceback (most recent call last): File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-b8e60f98.412250524914822925/dashboard/pinpoint/models/quest/execution.py", line 95, in Poll self._Poll() File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-b8e60f98.412250524914822925/dashboard/pinpoint/models/quest/run_test.py", line 213, in _Poll raise SwarmingTestError('The test failed. No Python ' SwarmingTestError: The test failed. No Python exception was found in the log.

And a link to the pinpoint job: https://pinpoint-dot-chromeperf.appspot.com/job/12c525c3640000

simonhatch commented 5 years ago

I'm not sure why those particular runs failed, we can look into that, but your job is actually still running.

ksagar commented 5 years ago

It has completed now but 8/10 runs with the patch and 5/10 runs without the patch failed.