NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
54 stars 37 forks source link

[FEA] Add qualification support for Databricks Photon event logs #251

Closed mattahrens closed 4 days ago

mattahrens commented 1 year ago

I would like to see estimated speedup on GPU compared against Databricks Photon. This work will include parsing Databricks Photon event logs and then generating speedup factors for Photon operators to Spark RAPIDS operators.

### Tasks
- [ ] https://github.com/NVIDIA/spark-rapids-tools/issues/449
- [ ] #1338
- [ ] https://github.com/NVIDIA/spark-rapids-tools/issues/1388
- [ ] #1413
- [ ] https://github.com/NVIDIA/spark-rapids-tools/pull/1409
- [ ] #1417 
amahussein commented 6 months ago

@mattahrens do we still need this issue? Currently we skip Photon jobs in the Qualification tool.

mattahrens commented 6 months ago

This still might be prioritized in the future so we can keep it open

parthosa commented 3 weeks ago

Discussed the next steps for Photon integration into QualX with @leewyang and @eordentlich.

Assumptions:

Solution:

Alternatives:

cc: @amahussein @tgravescs

mattahrens commented 3 weeks ago

Agreed that heterogenous support makes sense, but can that be done in a follow-up PR? I don't think it's needed in this first iteration.

parthosa commented 3 weeks ago

Sure Matt. This would make QualX simpler. Updated the approach. We can add heterogenous support if needed later

tgravescs commented 2 weeks ago

Users do not provide heterogenous event logs

Are we going to fail or warn if we recognize this happening? I think a lot of companies will have mixed eventlogs.

parthosa commented 2 weeks ago

Eventually we would want to add support for mixed set. This approach is mainly to simplify the development process and proceed iteratively.

Both approaches have pros and cons.

Approach 1: If users provide mixed set of event logs --> Fail

Pros: Users do not get incorrect recommendation Cons: User experience may be compromised

Approach 2: If users provide mixed set of event logs --> Warn and fallback to use Spark CPU strategy

Pros: User experience is better. There are no failures Cons: Users will get unexpected recommendation. It can cause silent errors/warnings.

IMO, Approach 1 makes more sense. Although, the user experience is compromised, any unexpected or silent errors will be avoided.

tgravescs commented 2 weeks ago

What is the expected time frame to add the heterogenous, if we are going to add soon then it might not matter to much.

We could always choose whatever the first eventlog has and log it, then if we come to one that is of the opposite type, we skip running on that eventlog but make sure we mark it as skipped because of this condition so that we try to make it obvious to the user. The question is do we make it obvious enough if skipping it.

parthosa commented 2 weeks ago

From development perspective, adding support for heterogenous would be a small change in the Python tools side.

@leewyang Would it be feasible for QualX to support heterogenous event logs (photon + spark) easily? If yes, then we can directly add heterogenous support.

leewyang commented 2 weeks ago

@parthosa We'd just need something that we could parse that identifies each uniquely. As you mentioned earlier, I think we could just parse the spark_properties.csv and add an indicator to our profile/features dataframe, then we'd group/filter by that indicator before loading the associated qualx model and running prediction. The trickiest part would be reconstructing the correct order (if required) by stitching the two results (for photon and spark) back together, but I think it's doable.

parthosa commented 2 weeks ago

That's great then.

trickiest part would be reconstructing the correct order

Ordering should not be a problem since we do a left join between output DF from JAR and resulting DF from QualX based on App Id

@mattahrens: Since it is quite feasible from both QualX and Python Tools to add support for heterogenous support, we should directly proceed to this instead of an intermediate stage that will be eventually modified.

parthosa commented 4 days ago

Closing this issue as all subtasks for adding support for Photon event logs have been completed.

Usage

To run the tool with Photon event logs, use the following command:

spark_rapids qualification --platform databricks-aws --eventlogs <photon-event-log>

Note: