flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
163.55k stars 26.9k forks source link

Make tests compile and run faster #69429

Open HerrNiklasRaab opened 3 years ago

HerrNiklasRaab commented 3 years ago

Use case

According to flutter.dev one of the things flutter aims at is fast development. I think being able to develop in a TDD workflow contributes to this goal.

In our case, we have suite about 500 tests, consisting of unit and widget tests. In order to facilitate a real TDD workflow, all the tests should run very fast. There are some sources that suggest a maximum executing time of less than 10s: https://blog.ploeh.dk/2012/05/24/TDDtestsuitesshouldrunin10secondsorless/#:~:text=The%20test%20suite%20you%20work,be%20faster%20than%2010%20milliseconds.

Currently, our test suite is executing in about 2min, which makes the TDD approach very hard, because it disrupts your workflow (better described in the article).

Proposal

The aim of this ticket is more to collect all the tickets, that would contribute to a better TDD experience. So please add additional things that prevent you from a good TDD experience.

Live Unit Testing

https://github.com/flutter/flutter/issues/4719 https://github.com/Dart-Code/Dart-Code/issues/2878

Improve Performance of Tests

https://github.com/flutter/flutter/issues/17130 https://github.com/dart-lang/sdk/issues/32953

Hot Reload for Unit Tests

https://github.com/flutter/flutter/issues/25973 https://github.com/szotp/hottie/

Please consider 👍 this and linked tickets if you thing this is important.

pedromassangocode commented 3 years ago

Hi @HerrNiklasRaab I looked at the link you posted an I got this piece:

That test suite doesn't need to be the same as is running on your CI server. It could be a subset of tests. That's OK. The TDD suite may just be part of your Test Pyramid.

I'm wondering if you are running all the tests of the project or just a subset of tests related to the [new] test you are adding?

I personally think that the duration of tests will always depend on the numbers of tests you are running and we can't just make 10.000+ tests to run in 10 seconds. As your tests increase the time will increase as well.

HerrNiklasRaab commented 3 years ago

Hey @pedromassangocode,

thanks for your response. Sure if the project grows very much, at some point you will need to split it into multiple test suits. But currently, we have 500 tests, who take about 2min. This I think is definitely too long.

I'm wondering if you are running all the tests of the project or just a subset of tests related to the [new] test you are adding?

Currently, we are running the tests of the whole project.

Maybe we can keep this ticket open to track the experience from the point of view of a TDD developer.

Best regards,

Niklas

pedromassangocode commented 3 years ago

I'm still not sure what can be done here so will leave this opened for the right people to handle it. Cc @Hixie @HansMuller

Hixie commented 3 years ago

@HerrNiklasRaab We will need more information to help here, for example, some profiles and traces of running the tests in question to see where the time is going.

It would also be good to have an estimate of how long each test should take. Is it reasonable for a test to take 250ms on average? Or should it be faster? Also, it would be useful to know whether you get different performance by changing the -j option to change how many cores are being used.

HerrNiklasRaab commented 3 years ago

Hi @Hixie,

Thanks for your answer.

I think the most important part of unit tests is that they run fast, with fast I mean very fast. As soon as they are so slow that you don't run them every few minutes they turn into a burden. With fast tests your workflow looks like:

With fast tests:

  1. Run all tests
  2. Change one thing
  3. Run all tests → When it fails you know exactly what which change caused the failure

With slow tests

  1. Run all tests
  2. You change multiple things because tests are slow
  3. You end up with multiple test failing, you need to start debugging, find the changes that caused the failure

My argument is, that the faster the tests run, the shorter this loop is, the more efficient you are developing, because your mind only needs to concentrate on one thing. You never need to debug.

Having very fast tests is crucial. That's why I think 250ms is way too long (no offense). I worked on projects with 5ms per test. This means you can have 2000 tests in your suite and they are still running in 10s.

With 250ms you can only execute 40 tests in 10s. This means you either run them not as often, (consequences described above) or you need to spend a lot of time thinking which test to run. This is very difficult (because you don't know which tests are affected by your change), time-consuming and there is no support in the IDE for running only a subset of tests.


I tried it with "-j", and it yielded a slight improvement. I have also the logs when running with "-v" but I am not sure if this is what you meant with "some profiles and traces of running the tests".

Verbose logs: text-verbose.txt.zip

image

Thanks very much and very interested in your response,

Niklas

PS: If improving the testspeed by this ammount is not possible due to technical reasons. Maybe it might be worth having a look at this project, which yields the hot reload: https://github.com/szotp/hottie/

Hixie commented 3 years ago

By profiles I mean instrumenting the code as it runs to see what is being slow (https://en.wikipedia.org/wiki/Profiling_(computer_programming)). The Dart debugger supports profiling. It should be possible to profile a test run from the flutter tool by starting it in the debugger, then attaching to the debugger, enabling the profiler, resuming the code, and seeing what it lists. Similarly the timeline traces can be obtained from the debugger. They show how much time was spent doing various things. We may need to instrument the tests to measure time spent preparing the tests vs executing them, etc.

5ms per test seems reasonable as a target. On my 20-core machine, running the flutter unit tests, I get about 20ms per test. It could definitely be faster.

One thing you can do to make things faster is toggle debugCheckIntrinsicSizes. Specifically, create a test binding (AutomoatedTestWidgetsFlutterBinding subclass) that returns false for "checkIntrinsicSizes", and instantiate that before calling testWidgets the first time. That might help a lot. (Doing this disables an expensive test that checks for intrinsic contract violations. If you're making your own RenderBox subclasses you want it enabled but otherwise you could just turn it off if you have a lot of tests without losing much value.)

HerrNiklasRaab commented 3 years ago

Hello @Hixie,

thank you very much for your suggestion (toggle debugCheckIntrinsicSizes). It reduced the time of the 359 tests from 1:00 to 0:50, about 10-12s which is nice :)

Here are stats in order get a feeling of the difference in computation speed between your and my machine: 20ms / test for tests of flutter on your machine 62ms / test for tests of flutter on my machine (Quad Core) 136ms / test for fazua_toolbox_app on my machine (Quad Core)

Also quite interesting is, that the average time per test on our project (fazua_toolbox_app) is 136ms / test and when I run the tests from flutter on my machine it executes much faster, with 136ms / test. To me, this either means that our tests (or widgets in our tests) are more complex, or there are things we can improve without changes in the "flutter test" tool.

My profiling attempt

I am sure I didn't execute to steps regarding profiling, like you described, because I didn't understand them fully. Nevertheless, I tried my best. So this is what I did:

  1. I picked a test file, with a few widget test, that have remarkably slow executing speed of 347ms / test.
  2. I added debugger() at the end of the last tests.
  3. I run the test with the following command flutter test --start-paused "test/widget_tests/questionnaire_tests/questionnaire_test.dart".
  4. I opened the link to the observatory.
  5. I opened the timeline of the process.
  6. I clicked all for Recorded Streams Profile
  7. I opened the debugger of the main isolate.
  8. I pressed F7 to continue the isolate.
  9. I refreshed the timeline.
  10. I saved the timeline.

Here is the saved timeline: dart-timeline-2020-11-13 (1).json.zip

Trying to understand how to profile correctly

In order to provide you with a timeline file, I need to better understand what exactly I should do. So here are the parts broken down:

It should be possible to profile a test run from the flutter tool by starting it in the debugger...

How can I start the test run in the debugger? Do you mean opening the debugger in the observatory? I haven't found anything, where I would be able to start it from there.

then attaching to the debugger

How can I attach to the debugger, shouldn't I attache the debugger to a process?

enabling the profiler

Where can I do that? Is the "cpu profile" meant here or the timeline again?


The debugCheckIntrinsicSizeshelped already a lot. Do you have other expensive checks in mind, which can be disabled? Or other ideas how we can improve the speed, that wouldn't need improvement in the framework? What are the common things that make tests slow (we don't have any runAsync statements)?

Again thank you very very much. I apologize for my little knowledge, but I am trying as much as I can to help solve this issue.

Best,

Niklas

Hixie commented 3 years ago

What you did with the timeline is perfect; for profiling it's basically the same thing but with the "CPU profile" option instead of the timeline. (I forget exactly what the UI looks like, it's been a while since I've done it. Sorry.)

sowens-csd commented 3 years ago

Could you post a code snippet for the debugCheckIntrinsicSize change? I tried defining a subclass but when I instantiate I get an assertion failure. I'd like to see if it speeds up the ~1800 tests I'm running.

HerrNiklasRaab commented 3 years ago
class AutomatedTestWidgetsFlutterBindingWithCheckIntrinsicSizesDisabled
    extends AutomatedTestWidgetsFlutterBinding {
  @override
  bool get checkIntrinsicSizes => false;
}

In the test file instanciate the class like this:

void main() {

  AutomatedTestWidgetsFlutterBindingWithCheckIntrinsicSizesDisabled();

  testWidgets('editing bike ', (tester) async {

  });
}

Please report the time improvment, I am very intrested.

sowens-csd commented 3 years ago

Thank you! I had the code right but was putting it in the wrong place. I had put it in the setup() method instead of at the main() level. Working now.

On my ~1800 tests there was no appreciable performance difference. I tried it with this

  bool get checkIntrinsicSizes => true;

to verify, here are the results:

flutter test --coverage false: 2:12 seconds true: 2:14 seconds

flutter test: false: 41 seconds true: 43 seconds

This is with version 1.22.5 from the stable channel.

Hixie commented 3 years ago

How much of a difference the intrinsic asserts make really depends on how much layout is happening in your unit tests. For what it's worth, we are disabling those by default now.

svarlet commented 2 years ago

Hello there!

I've been periodically checking the (lack of) activity on this issue. It seems we've narrowed down the scope of the discussion quite a bit. Even so, It seems it didn't lead to an effective solution for both this specific conversation and the original message posted.

I can report that for us, a suite of 350 tests (most being side effects free, very few with (filesystem) side effects) takes between 10 and 15 seconds to run. For anyone running the test suite frequently, sometimes up to once per minute, it's hard not to have one's mind wander, thereby losing focus or flow. In other programming landscapes, 2 orders of magnitude faster test suites are achieved without even trying. Why is it so slow in Dart? What can we do?

[unimpressed] It seems that poor test performance have been reported for years now, yet still receives little focus from the flutter/dart team(s). Proof of this is that this ticket created about 10 months ago is still just one of the 3330 P4 tickets... [/unimpressed]

Love, Séb

AlexMcConnell commented 2 years ago

I only have 16 tests, and I'm already very concerned about how slow they're going. At 8 seconds, they've already reached the time I'd expect to be able to run a thousand tests.

One thing I've noticed is that the void main() method has a huge overhead here.

I have 16 tests in 4 groups in 4 files. They take about 8 seconds to run.

If I take the 4 groups and put them all into 1 file, they take about 3.5 seconds to run.

If I take the 16 tests and split them up into 16 groups in 16 files, they take about 30 seconds to run.

That's a huge overhead per file.

I also tried doing mass copypasta. With my 16 tests all in one file, I recopied them a bunch of times. It took getting to 235 tests before catching up to the 30 seconds of 16 tests in 16 files.

Of course, 235 tests in 30 seconds is still crazy slow, but at least that's one thing found that's clearly been implemented in a cripplingly inefficient way.

AlexMcConnell commented 2 years ago
Screen Shot 2021-08-27 at 8 12 41 AM

This works and maintains my test log readability (though nested an additional level).

Sadly, I don't think this is a problem with Flutter, as I think I traced it back to Dart's test_core library, but that library was far too difficult to read for me to get far.

jakemac53 commented 2 years ago

See my comment on the issue filed in package:test https://github.com/dart-lang/test/issues/1567#issuecomment-907290381. The overhead here should be much better now, but you will I believe need to be on flutter dev or master channels to see it.

I suspect we can now close this issue.

AlexMcConnell commented 2 years ago

As shown by my follow-up comment, the overhead is actually worse.

Also, that issue you linked is about that specific problem, but this issue is more general. The tests are too slow even without that one issue. So I don't think this should be closed even if that issue were resolved.

svarlet commented 2 years ago

Agreed, until a real change is visible in the stable channel, It feels odd to "close" the issue.

jakemac53 commented 2 years ago

Ya sorry I hadn't read the full context of the issue :). The change in the test package is very specifically geared towards regular Dart tests also, and has no effect on flutter tests. It should provide a dramatic improvement for regular Dart tests though 👍 .

I believe that flutter tests are already optimized using similar means (it is what inspired the VM test optimization). So if you are experiencing slow flutter tests I would look at the flutter test platform code and/or flutter test utilities.

Hixie commented 2 years ago

We're nowhere near tests running fast even on master (on my 20 core machine, using all 20 cores, it takes SEVEN SECONDS for flutter test to even begin to run the first test in the flutter package, for example, and the whole test suite takes 3m20s, which is about 0.55ms per test line, which seems really high!), so I agree that this bug shouldn't be closed.

That said, it is our policy to close issues once they are fixed on master, we don't leave bugs open until they are fixed on stable. This is because otherwise we would be flooded with literally thousands of open bugs that are not actionable, whose close date would be months later. The GitHub issues are our development bug database and they therefore track master, not stable.

orestesgaolin commented 2 years ago

Just wanted to add that the most problematic for us is the performance of coverage reporting tests on CI. For example a suite of 1400 tests takes over 20 minutes to finish. In other words it's about 1 second per test. The same process but without --coverage takes around 6 minutes. It doesn't help much to cache build artifacts or dependencies. This really slows down the loop when doing the code reviews. We're looking into sharding the tests between multiple parallel runs, but in practice we wish it was just faster to run tests with coverage.

Hixie commented 2 years ago

Coverage being slow is a separate issue (it requires changes to different parts of the codebase). We've made significant improvements over the past year to the coverage (mostly around accuracy, but also performance), but if you are still running into performance issues please file a bug specifically on that. If you have specific tests we can run that demonstrate the problem that is ideal (the Flutter framework tests are probably a good place to start if your project isn't open source itself).