☂️ Proposal: Make on-device testing awesome 💫

matanlurey commented 1 week ago

Some examples of things @johnmccutchan and I would like to do, but can't today:

Background a Flutter app with platform views, force trimmed memory, and then take a screenshot of the result
Find and interact with native widgets beyond screenshots (tap, read semantics, etc)
Encourage our team to write and maintain more integration tests, because they are easy to write and fast to iterate on

We should do something about it. A couple of options on the table include:

Invest more in flutter_driver, keeping integration_test more or less where it is today
Invest more in integration_test, keeping flutter_driver more or less where it is today
Either of those, but actually deprecate and work towards removing the other framework
Something else entirely (maybe some community solution is best and we should invest more in that?)

Outside of Flutter, here are some popular integration test solutions for similar problem spaces:

Appium; many platforms (including Flutter!), runs on the host.
Detox; React Native, cross-platform; runs on the host.
Espresso, Android; runs on the host.
Selenium WebDriver, Web; runs on the host.
XCTest, iOS; runs on the device.
UI Automator, Android; runs on the device.
EarlGrey, iOS; runs on the device.

Read more about the background of flutter_driver v integration_test

## Background Flutter ships with [`package:flutter_test`](https://github.com/flutter/flutter/tree/master/packages/flutter_test), and the accompanying command, `flutter test`, which runs a headless version of Flutter (called [`flutter_tester`](https://github.com/flutter/engine/blob/0ea962dc8a71f7fab1122ea05801b6888fe2fc12/shell/testing/tester_main.cc#L597)) and runs Dart-based unit/functional tests, called _widget tests_, in a fake environment where the passage of time is controlled by the tester, with many extension points are stubbed out (like platform channels), and a _software-based_ renderer that is ~mostly platform agnostic (and does not require a GPU, for example). This workflow provides super fast lightweight tests that are suitable for testing widgets and compositions of widgets. It's possible to [interact](https://docs.flutter.dev/cookbook/testing/widget/tap-drag) with the widget(s) under test, observe changes as a result, and even [take screenshots](https://api.flutter.dev/flutter/flutter_test/matchesGoldenFile.html) and compare them for golden-file testing. Notably, this fake environment has the following limitations: - The test runs on a fake device[^1], and cannot interact with plugins - The passage of time is tightly controlled by the developer, and doesn't always reflect the real interactions in production - Platform _views_ do not show up, and cannot be interacted with (as there is no platform) and are missing in screenshots [^1]: It is technically possible to `flutter run` a `flutter_test` and have it run on a real device; however many of the limitations remain. Flutter also ships with two "integration test" packages, [flutter_driver](https://github.com/flutter/flutter/tree/master/packages/flutter_driver) and [integration_test](https://github.com/flutter/flutter/tree/master/packages/integration_test), which unfortunately are in a state[^2] of [neither being completed nor deprecated](https://github.com/flutter/flutter/issues/142021). It would take a lot of words to describe the current state, so instead focusing on some key points: ### Flutter Driver Runs the test script on the _host_, using a different API (similar to [ChromeDriver](https://chromedriver.chromium.org/)) than tests authored with `flutter_test`. **PROs**: - Conceptually simple; a small limited RPC-like API "talks" to a Flutter app running on a device - Capable of interactions that require a host, i.e. with `adb` or forcing a Dart VM GC - Already supports functionality such as screenshots **CONs**: - All interactions must be serializable, meaning `Finder`s cannot be re-used across `flutter_test` and `flutter_driver` - All interactions and assertions happen over RPC, leading to additional latency and in some cases, flakiness/synchronization - Can't (at least today) run on Firebase Testlab or systems that require a single bundled APK (or similar) ### Integration Test **PROs**: - Uses largely the same API as `flutter_test` - Runs entirely in the same process as the Flutter application/under-test, without RPC or serialization - Can more easily use a combination of platform channels or FFI to easily "talk" to the native platform - Is supported by Firebase Testlab and similar systems that require a single bundled APK (or similar), and no driver script **CONs**: - Is, from what I can tell, [incomplete](https://github.com/flutter/flutter/issues?q=is%3Aopen+is%3Aissue+label%3A%22f%3A+integration_test%22) (it's not clear we haven't finished migrating to it for a specific reason or not) - Structurally more difficult to interact with host-side tooling (i.e. `adb`, Dart VM, etc) - [^2]: Google employees can also read the internal-only [go/flutter-integration-testing](http://go/flutter-integration-testing).

/cc @goderbauer @tugorez @jonahwilliams

matanlurey commented 1 week ago

I did a quick search in org:flutter:

We use flutter_driver in roughly ~85 files
There is about 3x more of integration_test in roughly ~225 files

Interestingly, we use integration_test 40 times in flutter/flutter compared to 67 times of flutter_driver.

jonahwilliams commented 1 week ago

integration_test uses flutter_driver though, so there isn't really an A or B. Some of the applications written using integration_test will still be doing a flutter_driver style test. There was also a "migration" attempt that updated a bunch of tests to use integration_test and in the process kneecapped the benchmark results.

matanlurey commented 1 week ago

The question at hand is, where do we invest our time, and what do we tell our teams to use?

In other words, say we want to add support to talk to the native platform. Do we add that exclusively in flutter_driver? Do we add it exclusively to integration_test? Do we add it in a way where both can use the functionality? Assuming we have limited time/resources, what has the best chance of making our testing story better?

jonahwilliams commented 1 week ago

integration_test is just re-exporting parts of flutter_driver and flutter_test though

johnmccutchan commented 1 week ago

@jonahwilliams thanks for all this context. I was walking my dog tonight and I thought "I bet integration_test uses driver under the hood" glad to have it confirmed without me having to ask :)

Let me try to reframe Matan's point (and correct me if I'm wrong Matan):

We need to decide what our public on-device testing API is. This may(will) be based on integration_test, flutter_driver, flutter_test, or some combination of them. We probably won't deviate much from the norm here when it comes to how widgets are tested, etc. I have a preference for easy porting of tests between unit(host) and integration(target) harnesses but it's just a preference. I expect we will have to extend the existing APIs to allow for controlling of the device (background, trim memory, enable wifi, yada yada), maybe we should introduce the concept of a 'Device' to the test harness and we can hang the platform-specific functionality off of platform-specific Device sub-classes. I dunno, just thinking out loud at this point.

Additionally I think we should prioritize:

Ensure that we can write all of the example tests Matan talks about in the top comment. I don't want us to design everything ahead of time but design it as we actually write these kinds of tests.
On-device iteration time. Matan and I were talking about leaning heavily into 'hot restart' ('reload' would leak state and not re-run tests), and unless I change the native code of my app, there is no need to rebuild apk/whatever. I'm imagining that on-device in less than one second my app is restarted and tests are run again, that makes me excited to write tests.
User-friendly screenshot comparison flow that allows for implementations that integrate with skia gold, scuba, and just a regular old image differ.
Making sure that we, the Flutter team, dog food this API internally, writing a bunch of real-bug engine bug regression tests.

jonahwilliams commented 1 week ago

I think all of these are possible. Some of them even work today. The problem is they are not coherently presented, nor uniformly available.

For example, hot restarting flutter unit tests works! but you have to flutter run the test (which is at best .. lightly documented), and the result reporting doesn't work.

johnmccutchan commented 1 week ago

I'm glad to hear that a lot of the implementation bits are already available. And, yeah, I think this is mostly about uniform implementation and coherent APIs

jmagman commented 1 week ago

To link related work, @bkonyi wrote up Non-Dart Developer Tooling Capabilities Exploration, relevant sections:

What can be improved in Dart/Flutter: Area for Improvement: Simplify writing UI tests by recording device interactions
Dart / Android Studio / Xcode comparisions: Testing and Coverage

matanlurey commented 1 week ago

Update: @goderbauer @jonahwilliams @matanlurey @johnmccutchan met today and chatted about this.

We talked about either/or enhancing Flutter Driver and integration_test for on-device testing, focusing on use cases beneficial to the engine team, mostly about the capabilities and limitations of both tools, considering factors like interacting with widgets, reusing code, and synchronization.

The conclusion was to enhance flutter_driver (without breaking existing functionality) to support use cases like backgrounding a Flutter app, forcing trimmed memory, and taking screenshots, and we'd avoid any changes to integration_test at this time (though we had some cool ideas).

Next steps involve starting with scenario_app-style tests and reproducing real engine tests and regressions.

Concretely, I'll start working on a test suite in dev/integration_tests, and as-needed, add new functionality to flutter_driver (https://github.com/flutter/flutter/tree/master/packages/flutter_driver), likely in a non-public API (src/experimental or similar) as we plan to iterate on real-world scenarios.

mateuszwojtczak commented 19 hours ago

Hi @matanlurey! That’s an excellent summary of the current solutions existing in the SDK.

Diclaimer: I’m one of the authors of Patrol - a framework made exactly to solve some of the issues you described.

I’d like to shed some light on what choices we had to make and how community feedback has driven us there.

You can’t interact with native elements using integration_test

That was a main blocker for us, since almost every mobile app has some kind of a permission dialog or 3rd party SDK integration that is on the business critical path and you can’t just skip it. Also, to have real end-to-end device tests, we have to be able to interact with everything like a real user would.

Patrol introduces both convenience methods like enableWifi or pressHome, but also tap that simply enables you to interact with any native element on the screen like you would with 1st-party native tests but writing Dart code. (more examples here)

Most device farms don’t support Flutter explicitly

This is related to how integration_test works. Most device farms can work with running UIAutomator tests on Android and XCUITest tests on iOS. If we can be able to run as such and inside use Flutter testing, then Flutter is just an implementation detail for those farms. There’s a lot of tooling for UI tests in the native world and we as Flutter community would like to be able to use that.

That’s what we do in Patrol - when you run patrol test we run native test execution (e.g. ./gradlew connectedAndroidTest) and from that side we run Flutter integration tests.

Integration_test sends all the test results at the end which is not aligned with test execution model on the native side

Because of that, tests are not run in isolation (which can increase flakiness) and what is more important - if 99th test crashes, you lose all the results which can be very harsh (device farms are expensive). Also, every test is having “<1s” test duration, because from a native perspective the tests return immediately.

That’s what we also kinda solved in Patrol, because of running the tests “from the native side”.

flutter_test API is pretty low-level This one is not so much relevant, but I’d like to also cover it since this is such a broad discussion. flutter_test is a great low-level API for interacting with the widget structure, but the community needs something high-level to avoid boilerplate and doing the same helper methods in every project.

This includes things like waiting for something to be interactive, finding first by default, simpler ancestor/descendant methods, etc.

We created patrol_finders package for that just adds an opinionated layer for that - and I believe this is a perfect balance that low-level flutter_test is maintained by the Flutter team and higher-level stuff is community-driven.

Hot restart in tests

@johnmccutchan mentioned being able to iterate quickly with hot restart while writing integration tests. Please take a look at patrol develop command which basically lets you do that (also with writing native interactions) - more info here

Finally, we as Patrol team are very interested in what decisions are going to be made with both flutter_driver and integration_test, because:

We spent a lot of time trying to solve those issues, so it's better to share the findings
The decisions might break Patrol by design and it would be great to know that some time ahead.

flutter / flutter

☂️ Proposal: Make on-device testing awesome 💫 #148028