flock-lab / flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
https://flock-lab.github.io/flock/
GNU Affero General Public License v3.0
287 stars 39 forks source link

RFC: implementing vectorizable operations using SIMD Instructions #24

Closed gangliao closed 3 years ago

gangliao commented 3 years ago

When the memory is properly aligned, for streaming ETL, it's good to implement vectorizable operations (MIN, MAX, SUM...) to beat Flink.

Reference Creating faster AWS Lambda functions with AVX2. https://aws.amazon.com/blogs/compute/creating-faster-aws-lambda-functions-with-avx2/ Implementing Database Operations Using SIMD Instructions. SIGMOD'02

gangliao commented 3 years ago

Starting 01 DEC 2020, AWS Lambda is rounding up duration to the nearest millisecond with no minimum execution time. This makes vectorized execution more meaningful.

https://aws.amazon.com/blogs/aws/new-for-aws-lambda-1ms-billing-granularity-adds-cost-savings/

gangliao commented 3 years ago

The Arrow Columnar In-Memory Format: a standard and efficient in-memory representation of various data types, plain or nested.

https://github.com/apache/arrow/tree/master/rust