apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.18k stars 430 forks source link

[Core] Qualification Tool for Gluten #7544

Open srinivasst opened 2 days ago

srinivasst commented 2 days ago

Description

The purpose of this enhancement is to create a qualification tool that analyzes customer event files to determine which workloads are suitable for execution with Gluten. This is crucial when onboarding new customers, as not all workloads benefit from Gluten's native acceleration—especially workloads with RDD operations, unsupported SQL operators or UDF workloads.

Proposed Solution:

Develop a Java program to analyze the event files, given a Hadoop file path as input. The program will generate two reports:

Requirements:

srinivasst commented 2 days ago

WIP - will upload approach and PR in a few days.

xumingming commented 2 days ago

Interesting idea, Looking forward to it.