ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
220 stars 149 forks source link

Support single threading #2012

Closed bstefanuk closed 2 months ago

bstefanuk commented 2 months ago

Summary:

To improve the overall performance of Tensile, it is important to able to run in single-threaded mode. This enables better visibility into the profiling outputs because the most compute intensive functions are hidden behind joblib's pipelining primitives.

Outcomes:

Single-threaded operations are now supported.

Notable changes:

Improved test coverage on core functions in TensileCreateLibrary in addition to printing updates.

Testing and Environment:

nakajee commented 2 months ago

Do we have any test to verify CpuThreads=1? How to verify no joblib case?

bstefanuk commented 2 months ago

Do we have any test to verify CpuThreads=1? How to verify no joblib case?

@nakajee I have confirmed both of the following scenarios:

  1. Running TensileCreateLibrary after uninstalling joblib properly displays the warning and runs on a single thread.
    > UserWarning: Missing dependency 'joblib', program will run without parallelism
    ...
    # Reading logic files: 1 thread(s), 154 tasks .............................. 100.0% (took 14.3 secs)
  2. Running TensileCreateLibrary with the --jobs=1 doesn't display the warning, but runs only on 1 thread.
    # Reading logic files: 1 thread(s), 154 tasks .............................. 100.0% (took 14.4 secs)
nakajee commented 2 months ago

Is --jobs=1 same as globalParameters["CpuThreads"]=1?

nakajee commented 2 months ago

Do we have any unit test for TensileCreateLibrary?

bstefanuk commented 2 months ago

Is --jobs=1 same as globalParameters["CpuThreads"]=1?

Yes, setting --jobs=1 is equivalent to setting globalParameters["CpuThreads"] = 1

Do we have any unit test for TensileCreateLibrary?

Yes, we have numerous unit tests for Tensile. See Tensile/Tests/unit/test_TensileCreateLibrary.py