Qbeast-io / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
https://qbeast.io/qbeast-our-tech/
Apache License 2.0
210 stars 19 forks source link

Issue 317: Reduce optimization overhead #318

Closed Jiaweihu08 closed 5 months ago

Jiaweihu08 commented 5 months ago

Description

Fixes #317 through broadcasting rollup map and cube max Weights.

Type of change

This is a bug fix - #317

Checklist:

Here is the list of things you should do before submitting this pull request:

Jiaweihu08 commented 5 months ago

Pretty straightforward, lgtm +1

Can we have some test for that? Even if it's just only a report of computation time.

In one optimization execution of 50GB of input files, the time is reduced to 50%. In this case, the max task deserialization time has been reduced from approximately 1 minute to dozens of milliseconds.