flock-lab / flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
https://flock-lab.github.io/flock/
GNU Affero General Public License v3.0
285 stars 39 forks source link

feat: distributed flock functions for nexmark #437

Closed gangliao closed 2 years ago

gangliao commented 2 years ago

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

By submitting this pull request

gangliao commented 2 years ago

I am shocked by the

NEXMarkSource { config: Config { args: {"events-per-second": "5000000", "seconds": "1"} } }
...
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 209785
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 206512
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 209465
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 207614
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 209835
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 214711
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 205503
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 212309
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 208533
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 208232
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 213913
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 205838
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 194152
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 203748
[2022-01-17T23:12:50Z INFO  flock::window] [OK] function payload bytes: 203339
[2022-01-17T23:12:51Z INFO  flock::window] [OK] function payload bytes: 201824
END RequestId: 47337702-5dd5-4100-a7e4-9dae5a1d645d

I'm shocked by the performance of distributed mode, for example, the distributed hash join can do 5 million events in 1.5 seconds.

gangliao commented 2 years ago

We can do better if we change 16 partitions to 32 partitions or even 64 partitions.