Closed dave-2 closed 6 years ago
Where are you getting the timeout error? That link seems to have an error about not being able to find the build.
Sorry, this one: https://pinpoint-dot-chromeperf.appspot.com/job/12af0646240000
Yeah don't think we want to allow them to run this, isn't this usually run like blink_perf.css, blink_perf.svg, etc. on the waterfall? Didn't even know you could run it as one massive thing.
How long is the timeout for each individual iteration?
I think that, for a developer wanting to do some pref-comparisons, running the whole of blink_perf or system_health (also measured in hours) should be reasonable.
Maybe allow to tune the number of iterations? Or even put some jobs on a "slow" lane so they don't hog all devices at once? E.g. as a developer I would be comfortable kicking off one of this long running jobs and expect to have results 24 hours later.
@sadrulhc
Now that the smoothness page sets are being merged, the rendering benchmark is also exceeding the timeout: https://pinpoint-dot-chromeperf.appspot.com/job/146ce6de240000
It would be great to have some way of having increased timeouts, even if the job takes a lot longer to run
I agree having longer timeouts would be great, not just for blink_perf but for loading, system_health, rendering, and v8 benchmarks.
We can increase the timeout, but we would need to solve the bigger capacity/scheduling issue.
I talked with @simonhatch, and we're considering:
@ehanley324 do we have a way to shard individual benchmarks?
Right now, no, not individual ones. We currently assume that if you pass in a list of benchmarks they are all run on the same shard.
Right now the contract in run_performance_tests.py is that the env variables GTEST_TOTAL_SHARDS and GTEST_SHARD_INDEX are set (which we set in our recipe) and then we pass in a sharding map to so the script knows what subset of tests to run. We have a script to generate sharding maps and we are working on making it generate maps for only a subset of benchmarks (this work is needed for the CQ).
@dave-2 couldn't pinpoint just shard by story? Telemetry has a command line arg to list the stories, or maybe you could use the sharding maps?
Yeah that was the intent, to shard this in some way and we though since the waterfall knows (or will know) how to shard for OBBS if that was some place Pinpoint could access we could reuse it. If it's not, maybe sharding by story is the way to go.
You could easily pass in a command line to run an individual story and run_performance_tests.py would pass it through. I wasn't aware that there was a flag to telemetry to list the stories, we currently generate them off the benchmarks we find in the tools/perf directory.
If you want to re-use some of our sharding generation to determine what tests to run on what shard I can let you know when we have updated the sharding map generator to take a subset of benchmarks.
Yeah Pinpoint is capable of running with story filters, but right now it has no insight as to what stories a benchmark can run and doesn't have the option of just running it on the command line to list them out (easily). So the thinking was maybe the sharding code could be potentially shared.
I agree that we should try and share code. If we can generate a sharding map it is just a flag for the map and then a flag to swarming to trigger the job for the shard.
How does pinpoint handle the triggering and device affinity for each bot? As long as you can pass the right shard id to sawrming (this is taken care of in the recipe right now) we should be able to re-use with a change to our sharding_map_generator.
We increased the timeout.
https://pinpoint-dot-chromeperf.appspot.com/job/16d8637a240000
blink_perf takes over 2 hours, so it times out in Pinpoint. Do we want to allow users to run this benchmark suite? (Noting that perf try jobs currently run 10 iterations by default, so a 3-hour benchmark would occupy all 5 devices for 6 hours.)
@anniesullie @simonhatch