Open gaaclarke opened 1 year ago
FWIW, there is a pretty significant regression reproducible on A15 (iPhone 13 Pro) as well
import 'dart:ui';
import 'package:flutter/material.dart';
void main() {
runApp(const MainApp());
}
const kSigma = 1.0;
const kNumberOfBlurs = 6;
class _Blur extends StatelessWidget {
const _Blur();
@override
Widget build(BuildContext context) {
return ClipRRect(
child: BackdropFilter(
blendMode: BlendMode.srcIn,
filter: ImageFilter.blur(
sigmaX: kSigma,
sigmaY: kSigma,
),
child: Container(
color: Colors.red.withAlpha(30),
child: const Text('Blur'),
),
),
);
}
}
class MainApp extends StatelessWidget {
const MainApp({super.key});
@override
Widget build(BuildContext context) {
return MaterialApp(
home: Scaffold(
body: Stack(
children: [
ListView.builder(
itemBuilder: (context, index) {
return Container(
padding: const EdgeInsets.all(10),
child: Text('Item $index'),
);
},
itemCount: 1000),
for (var i = 0; i < kNumberOfBlurs; ++i) ...[
Positioned(
left: 0,
right: 0,
top: i * 150,
height: 60,
child: const _Blur(),
),
]
],
),
),
);
}
}
Wide gamut disabled:
Wide gamut enabled:
This is intentionally with sigma=1 to measure render pass overhead. Though things do get worse worse with increased sigma (i.e. sigma 30 - 8ms vs 22ms).
In our production app we have two blurs (toolbar + tab bar) and are unable to hit 60fps (let alone 120fps) with wide gamut enabled.
Also, I tried a quick and dirty hack to reuse offscreen textures. It seem to save about 1 - 1.5ms per frame in the wide gamut case. Doesn't solve the issue but it's certainly something to consider.
So for each of these 6 relatively small backdrops, there is MSAA backdrop
texture fill that blits the content of entire frame. These blits in total take about 80% of rendering time. @bdero, and ideas here?
Here's the performance on iPhone 13 Pro, wide gamut, no MSAA backdrop
(LoadAction::kLoad + StoreAction::kStoreAndMultisampleResolve + just resolve on last pass):
MSAA backdrop
from above for comparison:
In this particular case (wide gamut) MSAA load/store seems to perform better than bliting from previous resolve. But it's still pretty slow (~ 1ms per render pass). I'm wondering if we could render backdrops that don't sample from each other in one pass. This would help with common case of blurred header + tabbar. @jonahwilliams
I think something like "if this is not the first pass and nothing in this pass has rendered where the backdrop is sampling from don't end the pass" would already be a significant improvement.
For reference, with a quick hack to render all backdrops in single pass (same visual result because they don't sample from each other):
(note that this is with sigma=1; The blur performance is still an issue, but unrelated to render pass overhead).
Didn't mean to hijack this issue - moved to https://github.com/flutter/flutter/issues/131567 and https://github.com/flutter/flutter/issues/131568.
Shower thought @gaaclarke , doesn't switching to BGR10_XR solve the plus clamping problem?
Shower thought @gaaclarke , doesn't switching to BGR10_XR solve the plus clamping problem?
Yep, there are a lot of other benefits to using BGR10_XR. I'd like it if we were using that. Originally when the feature shipped it was using it, but we had to drop back to f16 when we ran into a bug. I can't remember what it was off the top of my head now though.
edit: we had it for opaque flutter views, for transparent ones we always used f16
I see the following comment:
// MTLPixelFormatRGBA16Float was chosen since it is compatible with
// impeller's offscreen buffers which need to have transparency. Also,
// F16 was chosen over BGRA10_XR since Skia does not support decoding
// BGRA10_XR.
The documentation for https://developer.apple.com/documentation/metal/mtlpixelformat/mtlpixelformatbgra10_xr?language=objc says:
The alpha component is always clamped to a [0.0, 1.0] range in sampling, rendering, and writing operations, despite supporting values outside this range.
When investigating the regression in performance caused by wide gamut support we found two things: 1) The regression didn't show up on A15, but did on A13 1) The majority of additional time was in the blur fragment shader
We knew 64 bits/pixel is documented to be slower. Apple's recommendation is to use 40bits/pixel instead (a BGR10_XR color buffer with an u8 alpha buffer). That should make those operations faster.
The difficulty with that change is that using that scheme would require: 1) a second set of shaders since it would be reading from 2 samplers to get full color 1) all the logic that uploads and downloads textures would have to be updated to manage two textures 1) the blit operations to the surface would need to be updated to do 2 blits 1) the surface would have to be changed to match the 40bits/pixel scheme
Maybe in a future where we don't have to support non-wide gamut devices it will be more palatable to throw away the old code and just embrace the 40bits/pixel path. However, it seems that in that same future the cost of 64bits/pixel vs 40bits/pixel is negligible. 64 bits/pixel may even be faster on newer hardware since it's just using one sampler.
cc @jonahwilliams