catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
https://chromium.googlesource.com/catapult
BSD 3-Clause "New" or "Revised" License
1.93k stars 564 forks source link

ResponseExpectation should use different responsiveness histograms depending on scrollStart vs tap vs etc #1798

Closed tdresser closed 8 years ago

tdresser commented 8 years ago

Currently RAIL considers the start of scrolling to be a Response, giving it a maximum of 150ms of latency while still being considered reasonable.

If we consider the start of scrolling to be an Animation, that gives it a maximum of 66.6ms of latency, which is likely still high enough that it causes user pain.

For now, we should consider the start of scrolling to be Animation.

benshayden commented 8 years ago

Great question!

Originally, the RAIL Measurement Definitions defined ScrollBegins to be Responses with a 16ms target latency with an eye towards finger-stickiness: https://docs.google.com/document/d/1s9xq65DOML9GQwLC8x14qBV4rFGXsrOpXaTW5qxZTc0/edit There was some discussion about this corner-case, but I don't know if it was ever really resolved. Currently, RAIL categorizes ScrollBegins to be Responses as per that doc, but it doesn't implement the special 16ms target latency. @natduca Any memories or thoughts about this?

Here are a couple concerns I have with changing ScrollBegins to Animations, independent of that spec doc.

Currently, AnimationIR defines its comfort in terms of its average FPS and jank (currently discrepancy, might change to RMS or something similar). It considers all of the frames together holistically, so, for best effect, Animations should be defined so that their frames are as homogeneous as possible from a debugging perspective. AnimationIR doesn't score each frame's latency separately then combine the frame scores, although that's a possibility that we could explore.

IIUC, the sorts of problems that can delay the first frame are usually inherently different from the sorts of problems that can affect the rest of the frames, so scoring them separately will immediately help users sort out where their problems are. RAIL IR Finder could create one Animation for ScrollBegins and a separate Animation for the rest of the scroll. That might be a bit confusing since the definition of jank requires multiple frames.

Alternatively, if ScrollBegins stay Responses, then the ResponseInteractionRecord could implement the 16ms target latency by using different Histograms to define different comfort curves depending on the interaction type. The ScrollBegin Response comfort histogram would turn downwards at a lower latency than the Click/Keyboard/Tap Response comfort histogram. I can imagine similar arguments for splitting the histogram further: Keyboard latency might be expected to be lower than Click/Tap. ResponseIR could use its "name" field to choose which histogram to use. Or we could split ResponseIR into subclasses like KeyboardIR, ClickTapIR, ScrollBeginIR, each of which would have their own singular comfort histogram.

I'm not trying to argue for the RAIL Measurements Definition's original plan or tradition or anything like that, I'm happy to think about pros and cons and figure it out together.

WDYT?

natduca commented 8 years ago

I tend to think the start is a response but it has a different pain histogram than a click gesture's response... Tim and I talked about the pain threshold where the happiness dip ocurrs probably being less than 150, but quite a bit more than 16... maybe in the 40s range or so. He had a pretty nice analogy to the touchslop region that might make this all make sense.

Maybe you and he can vc, figure out a plan, and then share it with input-dev mailing list cc paulirish,paullewis?

benshayden commented 8 years ago

Heh, I got around to Response with different comfort histogram towards the end of that wall of text.

@natduca Any thoughts on subclasses of Response IR versus multiple histograms in response_interaction_record.html? Or another spelling?

@tdresser , if Response with different comfort histogram SGTY, then I can make that change, or we can VC if you want.

natduca commented 8 years ago

Hmmm I think if I was coding, I'd go for a mandatory enum saying which of a fixed set of response type it is... response_type_scroll, response_type_click etc... thats on basis that the expected amount of future customization is low... whereas, were we seeing a large number of future customizations at more than just the histogram level, I'd probably go for the extension approach. But, all that having been said, I could be pursuaded either way.

tdresser commented 8 years ago

You've convinced me - response with a different comfort histogram seems superior.

I suspect that (given that we'll already have up to 50ms of latency due to the idle time) the recommended delay will be at least as low as it is for the Animate stage though.

benshayden commented 8 years ago

Great! I think I can throw together a patch and we can haggle over the API where we can see the code, if that sounds ok to everybody.

@tdresser Do you want to expand on Nat's comment about the touchslop region analogy? It sounds like "distance = rate * time" algebra, which might be interesting.

tdresser commented 8 years ago

There are two reasons scrolls start slowly:

  1. The browser hasn't identified that finger movement is part of a scroll. We use a distance threshold here that depends on the device, but it's generally about 8 dips on Android. We call the area within that threshold the "slop region".
  2. The page prevents us from scrolling.

In the first case, once the browser identifies that finger movement is part of a scroll, we ignore all of the delta that's accumulated thus far, and start scrolling.

In the second case, once the page stops blocking us, we perform a large scroll to catch up to where we're supposed to be.

I'm going to play with ignoring the delta that happens while the page is blocking, which I think may greatly improve the user experience here.

If I'm right, and we switch to ignoring the delta here, we could measure the user pain for scroll start based on the amount of slippage, not the amount of time the scroll start takes. The browser introduces some amount of slippage, and the page introduces some more, and we have some threshold of tolerance.

We'd really like all our thresholds to be time based though. We could try to convert this distance threshold into time. However, we don't have metrics on how fast user's fingers move at the beginning of a scroll, and the data for how fast fingers move in general does't feel very applicable.

I'm not sure how far this gets us though. We have a hard minimum of 50ms for our response time to scroll start due to the Idle requirements. I'm going to do some experimentation on different approaches to dealing with jank at the start of scroll, and hopefully that will give us some insight into what's acceptable, but I wouldn't be surprised if 50ms is already above what's noticeable.

tdresser commented 8 years ago

I just did some playing around with this.

Ignoring the delta while the page is blocking is terrible in some circumstances - it means that when trying to page through content, you can't count on a repeated gesture to scroll the same distance every time.

I'm surprised at how reasonable 100ms of delay is. It does cause pain, but it's not outrageous like I thought it would be.

With a high amount of jank at the start of scroll, I sometime lose my place in what I'm scrolling. Continuous motion helps your eye track where the content is. With a wall of text, I sometimes lose my place with 50ms of delay, but it might be easier to keep your place with a real web page.

Based on the demo I constructed, I'm fine with 16ms of delay, 30ms is bearable, but anything more than that starts to be pretty annoying.

That's obviously highly anecdotal though.

natduca commented 8 years ago

For posterity, @rbyers suggested that instead of having R have different goalposts depending on name, we could have a scroll be single I folowed by A. I tend to think we shouldn't feel constrained on how many letters we have in the actual implementation, so having R1 and R2 is probably better than overloading I. But, I could go many ways on this one. P1 is that we have a robust discussion. :)

@paulirish fyi

RByers commented 8 years ago

FWIW it looks like I was just repeating what @tdresser was saying when he opened this issue. To me what makes scroll start special is that you're expecting it to animate smoothly and a big jump at the beginning is jarring (really just like an animation jank). It breaks the physical illusion of pushing something (where as taps have a very different physical illusion that's much more forgiving). Given that, it seems reasonable to just take the simple option and consider this 'A' (with the extra 50ms afforded by 'I' like for the start of any animation). But if it's useful to use the more powerful 'separate comfort histogram' then that seems fine too (just seems harder to articulate / justify).

natduca commented 8 years ago

Mmm what I wrestle with in framing that initial slop allowance as Idle stage (i framing? /me snickers) is the case where a user is idle for 5 seconds, and then they go to scroll.

               first frame onscreen         ->|
        first touch for a scroll   ->|        |
Time: <      clearly idle           >|<??????>|<clearly animating>

If we viewed there being one long idle that spans both the the "clearly idle" and the ????, my worry is that it'd make it harder to reason about what is expected of the "clearly idle" chunk... in that chunk, if you're doing that for a long time, its way wrong because you're burning battery. If your tab is backgrounded and you're doing that, we provide deep shame instead of a score. More interestingly, if you're doing heavy work in idle in the first 10sec after page load, its probably okay... maybe? There's some complexity here beyond just responsiveness that happens when you think about this framework being used for system health, basically. I dont have this model completely, but I do see some flaws.

Alternatively, we could view this scenario as having two I-stage IRs. Namely, there's the "clearly idle" and then the ??? is another idle, and then score the ??? as we'd score an idle. I suspect there're complications in that framing too? I'm not sure.

IMO, I think this space is always gonna be hard to explain beyond the PR version. :) This is kinda why I think having a piece of software that tries to score the big picture [and provide tooling that lets you think with more sophistication than just 4 letters] is probably a good thing. What I think matters is that the results are reliable, actionable and at least make sense, and people can use them.

benshayden commented 8 years ago

See #1242 about the long tasks in Idle IRs.

For the trace viewer UI, see #1186.

benshayden commented 8 years ago

So I don't forget it over the weekend: use sub-sub-classes like ClickResponseIR instead of a switch(subtypeName) in ResponseIR.

natduca commented 8 years ago

What's the plan of record for fixing this bug?

benshayden commented 8 years ago

Define Response/AnimationIR subclasses as defined in the RAIL IR Subtype Names doc like KeyboardResponseIR. Each subclass will get its own comfort histogram.

natduca commented 8 years ago

Sure, but are does the scroll begin with an R-stage or I-stage?

And, what is the comfort histogram the scroll's first frame?