flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
162.3k stars 26.67k forks source link

☂️ Proposal: Make on-device testing awesome 💫 #148028

Open matanlurey opened 1 week ago

matanlurey commented 1 week ago

Some examples of things @johnmccutchan and I would like to do, but can't today:

We should do something about it. A couple of options on the table include:

Outside of Flutter, here are some popular integration test solutions for similar problem spaces:

Read more about the background of flutter_driver v integration_test ## Background Flutter ships with [`package:flutter_test`](https://github.com/flutter/flutter/tree/master/packages/flutter_test), and the accompanying command, `flutter test`, which runs a headless version of Flutter (called [`flutter_tester`](https://github.com/flutter/engine/blob/0ea962dc8a71f7fab1122ea05801b6888fe2fc12/shell/testing/tester_main.cc#L597)) and runs Dart-based unit/functional tests, called _widget tests_, in a fake environment where the passage of time is controlled by the tester, with many extension points are stubbed out (like platform channels), and a _software-based_ renderer that is ~mostly platform agnostic (and does not require a GPU, for example). This workflow provides super fast lightweight tests that are suitable for testing widgets and compositions of widgets. It's possible to [interact](https://docs.flutter.dev/cookbook/testing/widget/tap-drag) with the widget(s) under test, observe changes as a result, and even [take screenshots](https://api.flutter.dev/flutter/flutter_test/matchesGoldenFile.html) and compare them for golden-file testing. Notably, this fake environment has the following limitations: - The test runs on a fake device[^1], and cannot interact with plugins - The passage of time is tightly controlled by the developer, and doesn't always reflect the real interactions in production - Platform _views_ do not show up, and cannot be interacted with (as there is no platform) and are missing in screenshots [^1]: It is technically possible to `flutter run` a `flutter_test` and have it run on a real device; however many of the limitations remain. Flutter also ships with two "integration test" packages, [flutter_driver](https://github.com/flutter/flutter/tree/master/packages/flutter_driver) and [integration_test](https://github.com/flutter/flutter/tree/master/packages/integration_test), which unfortunately are in a state[^2] of [neither being completed nor deprecated](https://github.com/flutter/flutter/issues/142021). It would take a lot of words to describe the current state, so instead focusing on some key points: ### Flutter Driver Runs the test script on the _host_, using a different API (similar to [ChromeDriver](https://chromedriver.chromium.org/)) than tests authored with `flutter_test`. **PROs**: - Conceptually simple; a small limited RPC-like API "talks" to a Flutter app running on a device - Capable of interactions that require a host, i.e. with `adb` or forcing a Dart VM GC - Already supports functionality such as screenshots **CONs**: - All interactions must be serializable, meaning `Finder`s cannot be re-used across `flutter_test` and `flutter_driver` - All interactions and assertions happen over RPC, leading to additional latency and in some cases, flakiness/synchronization - Can't (at least today) run on Firebase Testlab or systems that require a single bundled APK (or similar) ### Integration Test **PROs**: - Uses largely the same API as `flutter_test` - Runs entirely in the same process as the Flutter application/under-test, without RPC or serialization - Can more easily use a combination of platform channels or FFI to easily "talk" to the native platform - Is supported by Firebase Testlab and similar systems that require a single bundled APK (or similar), and no driver script **CONs**: - Is, from what I can tell, [incomplete](https://github.com/flutter/flutter/issues?q=is%3Aopen+is%3Aissue+label%3A%22f%3A+integration_test%22) (it's not clear we haven't finished migrating to it for a specific reason or not) - Structurally more difficult to interact with host-side tooling (i.e. `adb`, Dart VM, etc) - [^2]: Google employees can also read the internal-only [go/flutter-integration-testing](http://go/flutter-integration-testing).

/cc @goderbauer @tugorez @jonahwilliams

matanlurey commented 1 week ago

I did a quick search in org:flutter:

Interestingly, we use integration_test 40 times in flutter/flutter compared to 67 times of flutter_driver.

jonahwilliams commented 1 week ago

integration_test uses flutter_driver though, so there isn't really an A or B. Some of the applications written using integration_test will still be doing a flutter_driver style test. There was also a "migration" attempt that updated a bunch of tests to use integration_test and in the process kneecapped the benchmark results.

matanlurey commented 1 week ago

The question at hand is, where do we invest our time, and what do we tell our teams to use?

In other words, say we want to add support to talk to the native platform. Do we add that exclusively in flutter_driver? Do we add it exclusively to integration_test? Do we add it in a way where both can use the functionality? Assuming we have limited time/resources, what has the best chance of making our testing story better?

jonahwilliams commented 1 week ago

integration_test is just re-exporting parts of flutter_driver and flutter_test though

johnmccutchan commented 1 week ago

@jonahwilliams thanks for all this context. I was walking my dog tonight and I thought "I bet integration_test uses driver under the hood" glad to have it confirmed without me having to ask :)

Let me try to reframe Matan's point (and correct me if I'm wrong Matan):

We need to decide what our public on-device testing API is. This may(will) be based on integration_test, flutter_driver, flutter_test, or some combination of them. We probably won't deviate much from the norm here when it comes to how widgets are tested, etc. I have a preference for easy porting of tests between unit(host) and integration(target) harnesses but it's just a preference. I expect we will have to extend the existing APIs to allow for controlling of the device (background, trim memory, enable wifi, yada yada), maybe we should introduce the concept of a 'Device' to the test harness and we can hang the platform-specific functionality off of platform-specific Device sub-classes. I dunno, just thinking out loud at this point.

Additionally I think we should prioritize:

jonahwilliams commented 1 week ago

I think all of these are possible. Some of them even work today. The problem is they are not coherently presented, nor uniformly available.

For example, hot restarting flutter unit tests works! but you have to flutter run the test (which is at best .. lightly documented), and the result reporting doesn't work.

johnmccutchan commented 1 week ago

I'm glad to hear that a lot of the implementation bits are already available. And, yeah, I think this is mostly about uniform implementation and coherent APIs

jmagman commented 1 week ago

To link related work, @bkonyi wrote up Non-Dart Developer Tooling Capabilities Exploration, relevant sections:

matanlurey commented 1 week ago

Update: @goderbauer @jonahwilliams @matanlurey @johnmccutchan met today and chatted about this.

We talked about either/or enhancing Flutter Driver and integration_test for on-device testing, focusing on use cases beneficial to the engine team, mostly about the capabilities and limitations of both tools, considering factors like interacting with widgets, reusing code, and synchronization.

The conclusion was to enhance flutter_driver (without breaking existing functionality) to support use cases like backgrounding a Flutter app, forcing trimmed memory, and taking screenshots, and we'd avoid any changes to integration_test at this time (though we had some cool ideas).

Next steps involve starting with scenario_app-style tests and reproducing real engine tests and regressions.

Concretely, I'll start working on a test suite in dev/integration_tests, and as-needed, add new functionality to flutter_driver (https://github.com/flutter/flutter/tree/master/packages/flutter_driver), likely in a non-public API (src/experimental or similar) as we plan to iterate on real-world scenarios.

mateuszwojtczak commented 19 hours ago

Hi @matanlurey! That’s an excellent summary of the current solutions existing in the SDK.

Diclaimer: I’m one of the authors of Patrol - a framework made exactly to solve some of the issues you described.

I’d like to shed some light on what choices we had to make and how community feedback has driven us there.

  1. You can’t interact with native elements using integration_test

That was a main blocker for us, since almost every mobile app has some kind of a permission dialog or 3rd party SDK integration that is on the business critical path and you can’t just skip it. Also, to have real end-to-end device tests, we have to be able to interact with everything like a real user would.

Patrol introduces both convenience methods like enableWifi or pressHome, but also tap that simply enables you to interact with any native element on the screen like you would with 1st-party native tests but writing Dart code. (more examples here)

  1. Most device farms don’t support Flutter explicitly

This is related to how integration_test works. Most device farms can work with running UIAutomator tests on Android and XCUITest tests on iOS. If we can be able to run as such and inside use Flutter testing, then Flutter is just an implementation detail for those farms. There’s a lot of tooling for UI tests in the native world and we as Flutter community would like to be able to use that.

That’s what we do in Patrol - when you run patrol test we run native test execution (e.g. ./gradlew connectedAndroidTest) and from that side we run Flutter integration tests.

  1. Integration_test sends all the test results at the end which is not aligned with test execution model on the native side

Because of that, tests are not run in isolation (which can increase flakiness) and what is more important - if 99th test crashes, you lose all the results which can be very harsh (device farms are expensive). Also, every test is having “<1s” test duration, because from a native perspective the tests return immediately.

That’s what we also kinda solved in Patrol, because of running the tests “from the native side”.

  1. flutter_test API is pretty low-level This one is not so much relevant, but I’d like to also cover it since this is such a broad discussion. flutter_test is a great low-level API for interacting with the widget structure, but the community needs something high-level to avoid boilerplate and doing the same helper methods in every project.

This includes things like waiting for something to be interactive, finding first by default, simpler ancestor/descendant methods, etc.

We created patrol_finders package for that just adds an opinionated layer for that - and I believe this is a perfect balance that low-level flutter_test is maintained by the Flutter team and higher-level stuff is community-driven.

  1. Hot restart in tests

@johnmccutchan mentioned being able to iterate quickly with hot restart while writing integration tests. Please take a look at patrol develop command which basically lets you do that (also with writing native interactions) - more info here

Finally, we as Patrol team are very interested in what decisions are going to be made with both flutter_driver and integration_test, because:

  1. We spent a lot of time trying to solve those issues, so it's better to share the findings
  2. The decisions might break Patrol by design and it would be great to know that some time ahead.