ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.07k stars 225 forks source link

Solver/Solution framework #866

Open atamazov opened 3 years ago

atamazov commented 3 years ago

This is a copy of a presentation for MIOpen team I held a couple of years ago, when we've introduced and implemented the Solver/Solution architecture. It does not cover the recent additions like GetWti() and Invokers. I would like to make it available for all MIOpen developers, including collaborators.

Though a bit outdated, this should provide a good overview of how device code is abstracted away from the rest of the library.

1. Intent

Problem: Variety of convolution kernels

Experience shows that straightforward attempts to support such a set of kernels result in host code which is large, fragile, difficult to develop and maintain. You may see leftovers of this in convolutionocl.cpp.

Provide abstractions which able to represent in the single place all the information required to

Such abstractions allow working with all convolutions in unified manner. Currently, there are:

atamazov commented 3 years ago

2. Problem Description and Context

2.1. Problem Description for an operation, e.g. conv::ProblemDescription

This is an input for the Solver.

2.3. ExecutionContext

TBD|

2.3. Operation Context, e.g. ConvolutionContext

Inherits from ProblemDescription and ExecutionContext, so for example an instance of ConvolutionContext can be used as an instance of conv::ProblemDescription. More info TBD.

atamazov commented 3 years ago

3. What is Solver

Solver is an object which encapsulates the implementation of specific primitive.

Member functions (see here for current prototypes):

If a Solver needs workspace:

Each Solver instance s can be used as a parameter to GetSolverDbId(s) template function which retrieves the string id of the Solver. There is default implementation of GetSolverDbId() which returns the class name and can be overridden if necessary.

If a Solver is Dynamic:

If a Solver is NOT searchable (NOT tunable):

If a Solver is searchable (tunable), then also the accompanying PerformanceConfig type shall be defined plus some member functions:

Generic search.

Modern Solvers employ Generic search.

The PerformanceConfig of a modern searchable Solver type shall provide some functions. These are necessary to build the ComputedContainer instances. The following member functions are required for that:

⚠️ IMPORTANT:

Serialization/de-serialization of PerformanceConfig instances

All PerformanceConfig types shall implement the following member functions:

atamazov commented 3 years ago

4. What is Solution

Information required to build and run a kernel (or a set of kernels), which is expected to perform computatons as per the ProblemConfig.

junliume commented 3 years ago

Just curious, should this topic belong or eventually belong to Wiki or Contribution Guide page? It looks like a guideline which we should follow. The intention for the issue is to bring up discussion and key decision?

atamazov commented 3 years ago

@junliume This is a copy of a presentation for MIOpen team I held a couple of years ago, when we've introduced and implemented the Solver/Solution architecture. It does not cover the recent additions like GetWti() and Invokers.

I would like to make it available for all MIOpen developers, including collaborators.

atamazov commented 3 years ago

5. Perf-db support

atamazov commented 3 years ago

6. Future directions

atamazov commented 1 year ago

7. Support for convolutions with non-packed tensors

Currently we are not going to include strides of non-packed tensors to the database keys. Only an optional flag (saying that at least one tensor is non-packed) should be included there. The above means that databases will share the same find-db records, same Invoker instances and same perf-db information for the non-packed convolutions that differ only in strides.

The above design should work correctly provided that:

_Originated from https://github.com/ROCmSoftwarePlatform/MIOpen/pull/2334#discussion_r1348090910_