Feat/benches - Githubissues

Roms1383 commented 1 year ago

Hi ! This feature is still a work a progress, but already quite advanced

Checklist

[ ] ~An issue to be fixed by this PR is listed above.~
[x] New tests are added to ensure new features are working. End-to-end tests are usually in the ./frb_example/pure_dart example, more specifically, rust/src/api.rs and dart/lib/main.dart.
[x] The code generator is run and the code is formatted (e.g. via just refresh_all).
[x] If this PR adds/changes features, documentations (in the ./book folder) are updated.
[x] CI is passing.

Remark for maintainer

So I was looking to provide an elegant way to benchmark the cost of sending over FFI and here it goes.

It's implementation is simplistic on purpose : it just wrap each of your functions with a wrapping function for benchmark. This feature doesn't try to teach how you should do your benchmarks, it just avoids you the boilerplate that you would otherwise write by hand (and follow general FRB philosophy). It doesn't pretend that tests will be the most accurate, this is still on your hands and depends on how you set them up :)

On all platform except web, it uses TimelineTask under the hood : there's now additional shortcut commands to run e.g. your tests, while setting pause to allow you to peek at the results in Dart DevTools. On web platform, it uses Stopwatch instead, so the results will directly be printed out to the console.

It can work in your tests, as can be seen in frb_example/pure_dart/dart/lib/main.dart (even if technically performance shouldn't be measured while running tests to get the most accurate results, but if it's just to compare functions with another, then it can still provide some useful insight). A traditional test with benchmark_harness is also provided, to compare the results if needed.

It can be easily completely opted-out, by installing FRB codegen without feature benchmarking. It can also be installed while still being disabled whenever needed, by settings CLI args bench-extended to false.

So far the only tasks left are :

[ ] write documentation in book
[ ] additional test on flutter app

fzyzcjy commented 1 year ago

Looks interesting!

it just wrap each of your functions with a wrapping function for benchmark.

Oops next time maybe create an issue first :/ Because frb already has a feature: Wrap a specific call with whatever code you like. For example, in my own app using frb, I call print both before and after a specific call, and also record the time spent in that call.

It can be easily completely opted-out, by installing FRB codegen without feature benchmarking

Would be great to not use a feature, since most people will download a prebuilt binary instead of building his own

fzyzcjy commented 1 year ago

Example: Log when execution starts and ends: http://cjycode.com/flutter_rust_bridge/feature/handler.html#example-log-when-execution-starts-and-ends

pub struct MyExecutor(ThreadPoolExecutor<MyErrorHandler>);

impl Executor for MyExecutor {
    fn execute<TaskFn, TaskRet>(&self, wrap_info: WrapInfo, task: TaskFn) {
        let debug_name_string = wrap_info.debug_name.to_string();
        self.thread_pool_executor
            .execute(wrap_info, move |task_callback| {
                Self::log_around(&debug_name_string, move || task(task_callback))
            })
    }
}

impl MyExecutor {
    fn log_around<F, R>(debug_name: &str, f: F) -> R where F: FnOnce() -> R {
        let start = Instant::now();
        debug!("(Rust) execute [{}] start", debug_name);
        let ret = f();
        debug!("(Rust) execute [{}] end delta_time={}ms", debug_name, start.elapsed().as_millis());
        ret
    }
}

fzyzcjy commented 1 year ago

Have not look into details but I guess parts of this PR is still quite useful for benchmarkers?

Roms1383 commented 1 year ago

Oops next time maybe create an issue first :/ Because frb already has a feature: Wrap a specific call with whatever code you like.

Ah damn yeah 😓 Especially I spent so much time, ❤️ and effort in this one... 😅

Have not look into details but I guess parts of this PR is still quite useful for benchmarkers?

Yes, there's still basically benchmark folder which can be used as an example on how to leverage benchmark_harness to build benchmarks with frb. Also, I can probably reuse TimelineTask and Stopwatch related code to implement an Executor that people could quickly wire in their own project, if needed ? Does FRB keeps some "middlewarish" executors for people to quickly use ?

fzyzcjy commented 1 year ago

Does FRB keeps some "middlewarish" executors for people to quickly use ?

You are free to write down middlewares (as long as they have zero overhead if someone choose to disable them) :)

fzyzcjy commented 1 year ago

Ah damn yeah 😓 Especially I spent so much time, ❤️ and effort in this one... 😅

I totally know the feeling :(

fzyzcjy commented 1 year ago

Btw, or maybe even modify this PR such that it is a real benchmarker? And things like https://github.com/marketplace/actions/continuous-benchmark can make it into CI. Of course cannot test it in Android/iOS because simulator performance should not be used, but seems desktop (win/mac/linux) CI is ok

Roms1383 commented 1 year ago

Another alternative @fzyzcjy would be to generate benchmark for each of the types supported by FRB (maybe in another subfolder) and benchmark time there, probably generating some kind of BENCHMARK.md as a result. In the long run that can be beneficial to FRB since we might be able to identify parts that could be optimized.

fzyzcjy commented 1 year ago

Yes, it is like a new demo, parallel to pure_dart and with_flutter

Roms1383 commented 1 year ago

Btw, or maybe even modify this PR such that it is a real benchmarker? And things like https://github.com/marketplace/actions/continuous-benchmark can make it into CI. Of course cannot test it in Android/iOS because simulator performance should not be used, but seems desktop (win/mac/linux) CI is ok

This is a pretty good idea, but I'm afraid it's only applicable to Rust code, is it ? For example a surprise I got is my little optimization trick for IntoDart for Vec<Uuid> doesn't account for Dart performance at handling its counterpart and converting back / from from List<UuidValue>. So overall it's still slower than sending List<String> over the wire.

fzyzcjy commented 1 year ago

but I'm afraid it's only applicable to Rust code, is it ?

we may not use all of that action. instead, we can write down some dart code to benchmark our case, and output (say) a json, and let that action to parse and visualize

fzyzcjy commented 1 year ago

We should measure the user-perceptable time btw, i.e. the dart side

fzyzcjy commented 1 year ago

I have this code in my own app:

// TODO move to flutter_rust_bridge
@internal
mixin FlutterRustBridgeInterceptorMixin<T extends FlutterRustBridgeWireBase> on FlutterRustBridgeBase<T> {
  @override
  Future<S> executeNormal<S>(FlutterRustBridgeTask<S> task) async {
    await interceptor?.beforeExecuteNormal(task.constMeta.debugName, task);
    final result = await super.executeNormal(task);
    await interceptor?.afterExecuteNormal(task.constMeta.debugName, task, result);
    return result;
  }

  @override
  Uint8List executeSync(FlutterRustBridgeSyncTask task) {
    interceptor?.beforeExecuteSync(task.constMeta.debugName, task);
    final result = super.executeSync(task);
    interceptor?.afterExecuteSync(task.constMeta.debugName, task, result);
    return result;
  }

  @protected
  FlutterRustBridgeInterceptor? get interceptor;
}

But, that is not for benchmark

fzyzcjy commented 1 year ago

For benchmark we may do this:

var timeBefore = DateTime.now();
for(var i=0;i<1000;++i) api.call_rust_method();
var timeAfter = DateTime.now();

Some philosophy in https://github.com/bheisler/criterion.rs, i.e. microbenchmarking, may help as well

Roms1383 commented 1 year ago

We should measure the user-perceptable time btw, i.e. the dart side

Yes, we're on the same page here.

For benchmark we may do this:

var timeBefore = DateTime.now();
for(var i=0;i<1000;++i) api.call_rust_method();
var timeAfter = DateTime.now();

Actually I think that what Stopwatch is designed for, as per Tracing Dart code performance.

From what I've noticed, Stopwatch is used to compute the diff and can be used both in web and non-web platforms. While TimelineTask is somehow better, but only works in the context of the Dart VM.

Some philosophy in https://github.com/bheisler/criterion.rs, i.e. microbenchmarking, may help as well

I already used criterion to add benchmark for allo-isolate, see the benches folder at the root of the repo. Indeed that's what I had in mind too.

Roms1383 commented 1 year ago

So, let me recap one more time just to make sure I don't invest too much time in the wrong direction. I should :

[x] remove feature-flag related code
[x] switch to executor based implementation:
- [x] refactor into a custom Executor
- [x] use Stopwatch to compute time spent
- [x] find a convenient way to pipe the results out as e.g. JSON
[x] create another example subfolder, e.g. frb_benches, and move benchmarking code there
[ ] add a CI workflow job (based on continuous-benchmark github action)
[ ] expose benchmarking feature as a middleware (if not too lengthy to implement, otherwise just provide a quick sample that people can copy/paste from easily)
[ ] write page in book (updated)

Please correct me, or add anything that I'd have overlooked @fzyzcjy

fzyzcjy commented 1 year ago

Totally agree!

Roms1383 commented 1 year ago

@fzyzcjy I don't see the TimelineTask's named events in the Dart Devtools anymore, I probably incorrectly implemented the Interceptor : do you have any idea why, maybe quickly looking at the files ?

fzyzcjy commented 1 year ago

For benchmarking I personally suggest just Stopwatch or DateTime. because TimelineTask is for generating a timeline event diagram, such as when you see rendering is slow

Roms1383 commented 1 year ago

Ok I can make the changes. But did you see anything related to the Interceptor since for example even my dart:developer log calls don't show up in the console ? It seems that my Dart implementation is not fully wired or something.

fzyzcjy commented 1 year ago

Not sure about that, never use that API IIRC

Roms1383 commented 1 year ago

Ok I nailed it !

Roms1383 commented 1 year ago

@fzyzcjy is there any way to retrieve the optional generated hint parameter from Dart on Rust side ? I'd need it to differentiate e.g. 2 calls to the same method with different parameters (useful to benchmark e.g. Vec<T> with variadic length).

Or, as a fallback, maybe a way to override debug_name_string ?

fzyzcjy commented 1 year ago

@fzyzcjy is there any way to retrieve the optional generated hint parameter from Dart on Rust side ?

Seems not yet IIRC

I'd need it to differentiate e.g. 2 calls to the same method with different parameters (useful to benchmark e.g. Vec with variadic length).

maybe no need for that hint at all :) just use normal arg etc

Roms1383 commented 1 year ago

maybe no need for that hint at all :) just use normal arg etc

Oh yeah, true ^^

Roms1383 commented 1 year ago

maybe no need for that hint at all :) just use normal arg etc

Well if I add it to the api method parameters, then I'm not really measuring sending e.g. a Vec<String> but now measuring sending a Vec<String> + a hint String.

And if I omit I can manually sort and reattribute to bind the dart call with its inner rust call, but that's a bit odd (to any contributor who'll come after to make changes without knowing why).

fzyzcjy commented 1 year ago

Why do you need, e.g. length, in Rust side? All time measurement is done in dart side, so rust side do not need to know anything

Roms1383 commented 1 year ago

Why do you need, e.g. length, in Rust side? All time measurement is done in dart side, so rust side do not need to know anything

No there's a misunderstanding here. Look at my latest commit especially here :

// main.dart in benches example
for (var i = 0; i < count; i++) {
    final dartMetric = dartMetrics[i];
    final rustMetric = rustMetrics[i];
    if (dartMetric.name != rustMetric.name) {
      throw Exception('metric should come in the same order');
    }
    rustMetric.extra = dartMetric.extra;
  }

I wanted to leverage hint to pass some infos 1000 uuids or 100,000 uuids from Dart to Rust, in order to correlate outer Dart call with inner Rust call. This way I can ultimately format the JSON output (with value, unit and name as expected by continuous-benchmark) like so e.g. with name like rust:handle_uuids:1,000 uuids for Rust metric and dart:handle_uuids:1,000 uuids for Dart metric.

So my workaround here in the meantime is that, since Dart metrics and Rust metrics are asserted to be of the same length, I just manually retrieve and set extra from Dart.

fzyzcjy commented 1 year ago

Not have much brainpower to go to the details now (still working on https://github.com/fzyzcjy/flutter_smooth - you can see it from commit history :) ). Anyway, looking forward to your final PR!

Roms1383 commented 1 year ago

Not have much brainpower to go to the details now (still working on https://github.com/fzyzcjy/flutter_smooth - you can see it from commit history :) ). Anyway, looking forward to your final PR!

I feel you man, no worries !

Simply put, continuous-benchmark will compare benchmarks between CI workflow runs based on metric's name, hence why.

Roms1383 commented 1 year ago

Anyway, it works now

in the console, classic
as JSON output, for continuous-benchmark

Roms1383 commented 1 year ago

I added continuous-benchmark but the docs clearly states that it shouldn't be run on PRs (I already configured condition accordingly in workflow), so it's gonna be difficult to test.

fzyzcjy commented 1 year ago

I added continuous-benchmark but the docs clearly states that it shouldn't be run on PRs (I already configured condition accordingly in workflow), so it's gonna be difficult to test.

Is it possible to test in your own clone of repo?

fzyzcjy commented 1 year ago

or temporarily run in pr

Roms1383 commented 1 year ago

We're almost there @fzyzcjy 😊 Only tasks left are : documenting, testing new github action in CI, and reviewing.

This afternoon I'm gonna try continuous-benchmark here since it would fail on my repo (credentials aren't shared across forks, AFAIR).

Roms1383 commented 1 year ago

@fzyzcjy I found a weird bug when retrieving env vars for benches example, so I also fixed it all across the repo.

Roms1383 commented 1 year ago

As a reminder, Dart unit-tests (Web) commands runs smoothly on my machine but fails in the CI with these errors :

dylibpath triggers an invalid ELF header in Run benchmark_harness example (Wasm) for ../../../target/wasm32-unknown-unknown/debug/flutter_rust_bridge_example_benchmark_suite.wasm (see in Actions)
a longer error in Run benches example (Wasm) which is probably due to missing tool(s) (see in Actions):

Info: Compiling with sound null safety
Compiled 11,160,277 characters Dart to 291,092 characters JavaScript in 5.59 seconds using 300.348 MB of memory
Dart file frb_example/benches/dart/lib/main.web.dart compiled to JavaScript: frb_example/benches/dart/web/main.web.dart.js
🦀 Server listening on http://localhost:8081 🎯
> xdg-open http://localhost:8081
/usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening 'http://localhost:8081'
Unhandled exception:
ProcessException: /usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening 'http://localhost:8081'

  Command: xdg-open http://localhost:8081
#0      system (file:///home/runner/work/flutter_rust_bridge/flutter_rust_bridge/frb_dart/bin/serve.dart:76:5)
<asynchronous suspension>
Error: Process completed with exit code 255.

@Desdaemon if these rings a bell, please let me know. In the meantime I disabled them to focus on what's left.

fzyzcjy commented 1 year ago

We're almost there @fzyzcjy 😊

Great! Cannot wait to land this feature :) (Not see that comment yesterday b/c when I receive notifications from github it opens the "Files changed" page not the "Conversation" page)

Desdaemon commented 1 year ago

dylibpath triggers an invalid ELF header in Run benchmark_harness example (Wasm) for ../../../target/wasm32-unknown-unknown/debug/flutter_rust_bridge_example_benchmark_suite.wasm

It seems like you're passing a WASM file where a dylib was expected.

a longer error in Run benches example (Wasm) which is probably due to missing tool(s)

The actions environment is running without a desktop environment so xdg-open won't work. You'll have to use Chromium headless mode like how --run-tests does it. An alternative is to run in MacOS or Windows instead.

Roms1383 commented 1 year ago

@Desdaemon huge thanks ! I have to check it out.

Great! Cannot wait to land this feature :)

Hehe me too!

Not see that comment yesterday (...)

No worries, it happens to me too. Btw you guys are the among the most reactive people I've seen around in the open-source. If anytime you need some time away for yourself, please do I can wait! I feel like I'm gonna need a slight break once this PR lands ^^

Roms1383 commented 1 year ago

Ok I commented the CI workflow job condition, let's see how it behaves... ❗

Roms1383 commented 1 year ago

Ok @fzyzcjy since the CI now fails on GIT ops (see Actions) I think it's good. I'm gonna revert the CI workflow comments on sensitive gh-pages op (the one who needs write permissions), reintroduce CI Dart Web jobs (thanks to @Desdaemon feedback I think I can fix them), start thinking about adding more benchmark samples and write the docs. Feel free to start reviewing the implementation ahead anytime !

Roms1383 commented 1 year ago

Well I honestly have no idea why the tests fails on web at this point @Desdaemon 😅 It shows the infamous store_dart_post_cobject error (see Actions) so naturally I tried to fix it the way I do locally (--define=SILICON=true in this context), but to no avail. Do you have any idea what's happening here by any chance ? Or should I switch to another macOs image, maybe not latest ?

Desdaemon commented 1 year ago

Isn't it --dart-define=SILICON=true instead?

Roms1383 commented 1 year ago

Isn't it --dart-define=SILICON=true instead?

--dart-define is used with flutter like flutter --dart-define=SILICON=true ... to pipe the env var to dart

when running dart directly, the directive is --define instead

there's a lot of shenanigans like this (especially between Dart VM and Dart2JS) I'll do my best to document them in the book.

Roms1383 commented 1 year ago

Well ok so I get the benchmark harness runs fine locally on Dart Web too, so latest commit I updated CI accordingly. I think it's fine honestly, but the CI keeps timing out (I assumed it was why @Desdaemon wrapped Dart web test with retry-action, so I just did the same). Btw previous job is still running (more than 1h and a half now) so you might want to stop it manually @fzyzcjy.

Roms1383 commented 1 year ago

@fzyzcjy which section of the book would be most appropriate for me to put incoming benchmarks.md in ?

fzyzcjy commented 1 year ago

@fzyzcjy which section of the book would be most appropriate for me to put incoming benchmarks.md in ?

No worries, just a random section. I will move it later if find a better place :)

fzyzcjy commented 1 year ago

🎉 See it ready for review

Still busy on flutter_smooth, will briefly review today :)

Roms1383 commented 1 year ago

ok past this point I won't touch anything, except for review's requested change :pray:

fzyzcjy / flutter_rust_bridge

Feat/benches #755

Checklist

Remark for maintainer