ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
49 stars 80 forks source link

Add API hipblasLtIsDeviceSupported() to query if current device is supported #860

Open jichangjichang opened 3 months ago

KKyang commented 3 months ago

Why we need this?

jichangjichang commented 3 months ago

Why we need this? https://github.com/pytorch/pytorch/pull/128753 pytroch team request: pytroch doesn't have a way to check if current device is supported by hipblaslt or not. They don't want to add hardcode to check "gfx90a", "gfx942" in framework side.

IMbackK commented 3 months ago

this pr suffers from the same issue as this one here https://github.com/pytorch/pytorch/pull/128753#discussion_r1649620265 ie it is impossible to get here with a hipblaslt unsupported gpu without the rocm runtime haveing silently (or not) entered a failed state.

jithunnair-amd commented 3 months ago

@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.

jichangjichang commented 3 months ago

@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.

@jithunnair-amd Any suggestions?

jithunnair-amd commented 3 months ago

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

jichangjichang commented 2 months ago

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?

jithunnair-amd commented 2 months ago

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?

Sounds okay to me if we have precedent. @jeffdaily, can you please let us know your opinion?

jeffdaily commented 2 months ago

I will make an exception for this. hipblaslt doesn't have a public rocblaslt interface or library. We really should be putting such extensions as "roc" APIs only. The HIP interfaces for our libraries are meant to be a 1-1 correspondence with CUDA libraries. Period. We're adding more and more API surface area to the HIP part of hipblaslt that goes against our policies. At least you're putting it behind an "ext" API name to avoid confusion here.