ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.03k stars 216 forks source link

MLIR Solvers Tuning Proposal #1018

Open jerryyin opened 3 years ago

jerryyin commented 3 years ago

Background and Motivation

Tuning has been known for providing optimal performance for a variety of algorithms. According to the study in https://github.com/ROCmSoftwarePlatform/MIOpen/issues/939, this continue to be true for cpp igemm solvers, the ones that MLIR solvers are prototyped from. To achieve the optimal performance towards important configs, we want to implement tuning for MLIR backend solvers in MIOpen. This involves three different areas:

Use Cases

The implementation of tuning support two high level use cases:

Implementation

Tuning parameter passing mechanism

Tuning parameters will be passed through comma split-ed C strings, in exact same format as the field stored in perfdb. The string can be either Serialize() or Deserialize(), as in MIOpen::Serializable{}. This way, either MIOpen or MLIR is able to communicate the tuning parameters with one additional string field. The benefit of using this mechanism are below:

Performance Config

To enable tuning, MLIR solvers need to implement their own performance config data structures. The MLIR solver needs to implement 5 performance config functions, according to

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/120289fcb33496db05518b24e3a39db01d5adb5c/src/include/miopen/generic_search.hpp#L61-L73

Alternative 1: Implement performance config in MIOpen

All the performance config implementation can be done together with the MIOpen solvers. This follows the existing practices. The implication of this is: MIOpen will be MLIR's only tuning driver, since only MIOpen's performance config have the knowledge about how to tune a MLIR solver. This implication looks acceptable as we do intend to use only MIOpen/Tuna infrastructure for tuning of MLIR solvers. In any other scenarios, the MLIR infrastructure works strictly as a compiler/kernel generator.

In this case, all we need to do is to implement two performance config class:

Alternative 2: Implement performance config in MLIR

With alternative 1, one might have fear that the two components are coupled too closely together, because MIOpen's performance config has to have implementation detail of a MLIR solver. When a MLIR developer decides to change the way to tune its kernels, it has to make change in MIOpen.

A solution to this is to abstract the implementation details in the Miir.h interface, leaving MLIR component to be solely responsible for its tuning. It can be challenging because performance config does not dependent on any external information to make its decisions. Performance config are just data structures that encode the tuning parameters. Especially in SetNextValue(), it doesn't carry any context information at all. This makes it very hard to create a handle and let MLIR know any information that help to create the nextValue based on the current set of performance config.

We need to make a combination of changes to make it possible for performance config tuning logic placed in MLIR

1. One generic Performance Config information

In this generic implementation, all MLIR solvers can share one implementation of performance config. Under the hood, the performance config will invoke Miir interface to decide on whether the config is valid/ how to set the next config.

struct PerformanceMlirGeneric {
    std::string TuningParams;
    PerformanceConvMlirIgemmFwd(const std::string& params) : TuningParams(params) {}
    PerformanceConvMlirIgemmFwd(bool spare) : TuningParams("uninitialized_spare") {}
    PerformanceConvMlirIgemmFwd() : TuningParams("uninitialized") {}
    bool operator==(const PerformanceConvMlirIgemmFwd& other) const {
        return tuningParams == other.tuningParams;
    }
    bool IsValid(const ConvolutionContext& ctx) const {
        return MiirIsConfigApplicable(mlir::ConstructBuildOptions(ctx, TuningParams, ...));
    }
    bool SetNextValue(const ConvolutionContext& ctx) {
        std::string nextTuningParams;
        bool isEndOfSequence;
        MiirGetNextTuningParams(mlir::ConstructBuildOptions(ctx, TuningParams, ...), nextTuningParams, isEndOfSequence);
        if (isEndOfSequence) return false
        TuningParams = nextTuningParams;
        return true;
    }
}
2. ComputedIteratorGeneric

The new ComputedIteratorMLIRGeneric will inherits from the original iterator. Based on whether the performance config is of PerformanceMlirGeneric, we have a different implementation of SetNextValue() that allow us to pass in the convolution context.

do
{
    if(!v.SetNextValue(*p))
    { // Wraparound, end reached. Iterator is useless from now.
        p = nullptr;
        break;
    }
} while(!v.IsValid(*p));
3. dynamically create a MLIR container
if (dynamic_cast<PerformanceMlirGeneric>(s.GetPerformanceConfig(context))) {
    ComputedContainerMlirGeneric<PerformanceMlirGeneric, Context> main(context);
    ...
}
Side effects

MLIR Implementations

For alternative 1, the only changes needed on MLIR side are:

For alternative 2, in addition, it requires a couple of new APIs to implement the performance config interface:

extern "C" MiirStatus miirTuningParamsInit(MiirHandle handle, char *tuningParams, size_t *size);
extern "C" MiirStatus miirTuningParamsNext(MiirHandle handle, char *tuningParams, size_t *size, bool *isEndOfSequence);

Under the hood, the two interface function in MLIR side will invoke into the corresponding static functions. Depending whether or not this is a xdlops requrest, according to the handle, it will launch and return a new set of tuning parameters according to the one from the handle.

Summary

Considering we do plan on make MIOpen the only tuning driver for MLIR, I personally favor alternative 1 over 2 on a variety of factors:

atamazov commented 3 years ago

Especially in SetNextValue(), it doesn't carry any context information at all. This makes it very hard to create a handle and let MLIR know any information that help to create the nextValue based on the current set of performance config.

Please see #388 and raise urgency level if necessary.

Considering we do plan on make MIOpen the only tuning driver for MLIR, I personally favor alternative 1 over 2 on a variety of factors...

I am for it.

jerryyin commented 3 years ago

Please see #388 and raise urgency level if necessary.

Thanks for bring that up, didn't know that has been documented. This is going to be a high priority job for me because it's going to be the largest feature in the next milestone.

I am for it.

Niiiiiice. Thanks for the review.

@JehandadKhan @whchung Any questions, concerns?

JehandadKhan commented 3 years ago

@jerryyin We have consensus since Approach 1 makes the most sense to me as well. Thank you for formalizing the ideas.

carlushuang commented 3 years ago

@jerryyin this is a good idea to parse ctx into SetNextValue. We have another scenario to utilize this new API, we can more easily to dicide the gemm_k split value based on the convolution context parsed in. currently we iterate this value based on either next pow of 2, or next multiply of CU number, without the knowledge of current conv configs.

Indeed I think there should be no harm that we just add ctx to SetNextValue's argument. If some solver do not need the ctx value, then just ignore it.

cc @shaojiewang

jerryyin commented 3 years ago

@JehandadKhan Thank you, appreciate your time for the review.

@carlushuang Agreed, I'm sure knowing more context information can be beneficial. Now that since I have the majority votes on alternative 1, we should continue the SetNextValue(ctx) refactor in its original ticket.


As the next step, I'd do the following:

jerryyin commented 3 years ago

Seeing no further feedbacks, I'm starting the implementation on tuning based on approach 1. I will start with the implementation of non-xdlops mlir solvers.

ppanchad-amd commented 4 months ago

@jerryyin, Is the implementation complete? If so, can you please close this ticket? Thanks!

atamazov commented 3 months ago

@ppanchad-amd This ticket is intended to document the current design (and its motivation), so it should never be closed. I recommend removing the https://github.com/ROCm/MIOpen/labels/urgency_high and https://github.com/ROCm/MIOpen/labels/value_high labels.

JehandadKhan commented 3 months ago

Perhaps we can move it to the wiki or documentation ?

On Mon, Apr 8, 2024 at 5:25 PM Artem Tamazov @.***> wrote:

@ppanchad-amd https://github.com/ppanchad-amd This ticket is intended to document the current design (and its motivation), so it should never be closed. I recommend removing the urgency_high https://github.com/ROCm/MIOpen/labels/urgency_high and value_high https://github.com/ROCm/MIOpen/labels/value_high labels.

— Reply to this email directly, view it on GitHub https://github.com/ROCm/MIOpen/issues/1018#issuecomment-2043741419, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK5ZOL24T2CI34H36XDXTTY4MKNRAVCNFSM47RAEMIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGM3TIMJUGE4Q . You are receiving this because you were mentioned.Message ID: @.***>

ppanchad-amd commented 3 months ago

@JehandadKhan Created internal ticket to assist with the documentation. Thanks!