[📍] blink_perf times out

catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..

https://chromium.googlesource.com/catapult

BSD 3-Clause "New" or "Revised" License

1.93k stars 563 forks source link

[📍] blink_perf times out #4499

Closed dave-2 closed 6 years ago

dave-2 commented 6 years ago

https://pinpoint-dot-chromeperf.appspot.com/job/16d8637a240000

blink_perf takes over 2 hours, so it times out in Pinpoint. Do we want to allow users to run this benchmark suite? (Noting that perf try jobs currently run 10 iterations by default, so a 3-hour benchmark would occupy all 5 devices for 6 hours.)

@anniesullie @simonhatch

simonhatch commented 6 years ago

Where are you getting the timeout error? That link seems to have an error about not being able to find the build.

dave-2 commented 6 years ago

Sorry, this one: https://pinpoint-dot-chromeperf.appspot.com/job/12af0646240000

simonhatch commented 6 years ago

Yeah don't think we want to allow them to run this, isn't this usually run like blink_perf.css, blink_perf.svg, etc. on the waterfall? Didn't even know you could run it as one massive thing.

perezju commented 6 years ago

How long is the timeout for each individual iteration?

I think that, for a developer wanting to do some pref-comparisons, running the whole of blink_perf or system_health (also measured in hours) should be reasonable.

Maybe allow to tune the number of iterations? Or even put some jobs on a "slow" lane so they don't hog all devices at once? E.g. as a developer I would be comfortable kicking off one of this long running jobs and expect to have results 24 hours later.

amyqiu commented 6 years ago

@sadrulhc

Now that the smoothness page sets are being merged, the rendering benchmark is also exceeding the timeout: https://pinpoint-dot-chromeperf.appspot.com/job/146ce6de240000

It would be great to have some way of having increased timeouts, even if the job takes a lot longer to run

anniesullie commented 6 years ago

I agree having longer timeouts would be great, not just for blink_perf but for loading, system_health, rendering, and v8 benchmarks.

dave-2 commented 6 years ago

We can increase the timeout, but we would need to solve the bigger capacity/scheduling issue.

I talked with @simonhatch, and we're considering:

Making long-running benchmarks lower priority, so they only run if no other tasks are scheduled on the same bots. This will prevent long-running jobs from blocking other jobs.
Sharding blink_perf so it can be run in smaller chunks. Because we have device affinity, blocking a bot for 3 hours means that all jobs that use that bot are delayed by 3 hours.

@ehanley324 do we have a way to shard individual benchmarks?

ehanley324 commented 6 years ago

Right now, no, not individual ones. We currently assume that if you pass in a list of benchmarks they are all run on the same shard.

Right now the contract in run_performance_tests.py is that the env variables GTEST_TOTAL_SHARDS and GTEST_SHARD_INDEX are set (which we set in our recipe) and then we pass in a sharding map to so the script knows what subset of tests to run. We have a script to generate sharding maps and we are working on making it generate maps for only a subset of benchmarks (this work is needed for the CQ).

anniesullie commented 6 years ago

@dave-2 couldn't pinpoint just shard by story? Telemetry has a command line arg to list the stories, or maybe you could use the sharding maps?

simonhatch commented 6 years ago

Yeah that was the intent, to shard this in some way and we though since the waterfall knows (or will know) how to shard for OBBS if that was some place Pinpoint could access we could reuse it. If it's not, maybe sharding by story is the way to go.

ehanley324 commented 6 years ago

You could easily pass in a command line to run an individual story and run_performance_tests.py would pass it through. I wasn't aware that there was a flag to telemetry to list the stories, we currently generate them off the benchmarks we find in the tools/perf directory.

If you want to re-use some of our sharding generation to determine what tests to run on what shard I can let you know when we have updated the sharding map generator to take a subset of benchmarks.

simonhatch commented 6 years ago

Yeah Pinpoint is capable of running with story filters, but right now it has no insight as to what stories a benchmark can run and doesn't have the option of just running it on the command line to list them out (easily). So the thinking was maybe the sharding code could be potentially shared.

ehanley324 commented 6 years ago

I agree that we should try and share code. If we can generate a sharding map it is just a flag for the map and then a flag to swarming to trigger the job for the shard.

How does pinpoint handle the triggering and device affinity for each bot? As long as you can pass the right shard id to sawrming (this is taken care of in the recipe right now) we should be able to re-use with a change to our sharding_map_generator.

dave-2 commented 6 years ago

We increased the timeout.