ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.08k stars 226 forks source link

[Navi21] BN lefovers from #1386 (ref: SWDEV-292187) #1405

Closed atamazov closed 6 months ago

atamazov commented 2 years ago

Leftovers from #1386:

Originally posted by @atamazov in https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1386#issuecomment-1021380312

muralinr commented 2 years ago

Hi Artem, Current OCL batchnorm code will be phased out soon. We are not spending time on further development. We will support blocking batchnorm issues only.

atamazov commented 2 years ago

@muralinr

Current OCL batchnorm code will be phased out soon. We are not spending time on further development. We will support blocking batchnorm issues only.

If so, then items (1), (4) and (5) are not needed. But we still need (2) and (3).

@junliume Please confirm, thanks!

atamazov commented 2 years ago

@muralinr @junliume

Current OCL batchnorm code will be phased out soon. We are not spending time on further development...

If so, then items (1), (4) and (5) are not needed...

Correction: "not needed" is incorrect. This depends on the ETA of the new BN code. What is your current estimation?

Until (1) is resolved, we are still in a high risk zone. The longer we are there the higher is probability of problems.

muralinr commented 2 years ago

@muralinr @junliume

Current OCL batchnorm code will be phased out soon. We are not spending time on further development...

If so, then items (1), (4) and (5) are not needed...

Correction: "not needed" is incorrect. This depends on the ETA of the new BN code. What is your current estimation?

Until (1) is resolved, we are still in a high risk zone. The longer we are there the higher is probability of problems.

@junliume will provide the ETA of the new BN code.

MelanieWindt commented 2 years ago

I would like to try to look into this

atamazov commented 2 years ago

@junliume Let's assume that we do not need to fix items (1), (4) and (5). Regression tests (2) will be added when we have support for MIOpenDriver in tests ready to use (I am working on this).

The question is: do we still have plans to phase out old OCL kernels and use something better?

junliume commented 2 years ago

@junliume Let's assume that we do not need to fix items (1), (4) and (5). Regression tests (2) will be added when we have support for MIOpenDriver in tests ready to use (I am working on this).

The question is: do we still have plans to phase out old OCL kernels and use something better?

Yes, we plan to replace the BN OCL kernels after CK integration, that's why I am trying to disable a few legacy BN and OCL related tests to keep CI stable at the moment. CC: @asroy @JehandadKhan @zjing14

atamazov commented 2 years ago

Description updated.

ppanchad-amd commented 6 months ago

@atamazov SWDEV-292187 is fixed and closed. Can we close this ticket? Thanks!