CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
219 stars 32 forks source link

Remove a confusing already registered and mapped warning #809

Closed linehill closed 6 months ago

linehill commented 6 months ago

Remove confusing "A device function is already registered and mapped to a different module" warning. It just happens to be that there may be multiple __hipRegisterFunction() calls to register same functions due to the way the code is generated for the inline and template functions. There is a bit more details in the code.

pvelesko commented 6 months ago

There is one case where this warning could be useful:

You have two TUs. In each, a template is defined with the same signature but a different body. You can then compile these two files into objects and you end up with the same function names but different body. Since these are templates, the compiler will treat them differently than normal functions and not give you “multiple definitions” error, instead it will map both calls to one of these functions.

I can see such a scenario happening in a test setup:

// testfile1.hip
template<typename T>
__global__ void testFunc<T>(T* a, T* b) { a[0] - b[0]; }

void runTest1() {...}
// testfile2.hip
template<typename T>
__global__ void testFunc<T>(T* a, T* b) { a[0] + b[0]; }

void runTest2() {...}
// runTests.hip

int main() {
  runTest1(); // both of these tests will launch the same kernel
  runTest2();
}

So if you happen to end up with this scenario, not getting the warning would make it challenging to debug.

linehill commented 6 months ago

So if you happen to end up with this scenario, not getting the warning would make it challenging to debug.

The warning, which is being removed, isn’t good for that due to its bad signal-to-noise ratio.

pvelesko commented 6 months ago

bad signal-to-noise ratio

I agree - so should we improve this warning then instead of getting rid of it?

linehill commented 6 months ago

bad signal-to-noise ratio

I agree - so should we improve this warning then instead of getting rid of it?

Probably needs heavy lifting. At runtime level, we would need to extract and compare SPIR-V functions to see that they are equivalent. We would also need to deal with cases where the functions differ vastly in SPIR-V but the original sources are identical. For example, an inline function could be defined in a header, used by two TUs but they are compiled with different compile options, like one with -O0 and the other with -O2, resulting in vastly different device code.