ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
208 stars 142 forks source link

Add profiling CI job #1923

Closed bstefanuk closed 2 months ago

bstefanuk commented 2 months ago

Objectives:

Outcomes:

Notes:

bstefanuk commented 2 months ago

In scope for this ticket is to add an environment variable that, when set, will enable and disable any functions marked with the @profile decorator.

ellosel commented 2 months ago
  1. Have you looked at what happens when nesting @profile e.g.
@profile
def func1():

@profile
def func2():
  func1()
  1. Should we consider adding @profile to other fxns?
  2. Do we generate a flamegraph?