Closed msimberg closed 6 months ago
cscs-ci run
cscs-ci run
I would prefer not to add the extra parameter to lacpy.
Did you considered to use hipPointerGetAttributes
to identify the MemcpyKind
directly in lacpy
only for buggy versions of rocm and continue using MemcpyDefault when it works?
I would prefer not to add the extra parameter to lacpy.
Did you considered to use
hipPointerGetAttributes
to identify theMemcpyKind
directly inlacpy
only for buggy versions of rocm and continue using MemcpyDefault when it works?
I did not, but that sounds like a good idea! I'll check if we can use that and update to use that instead if that's the case.
cscs-ci run
cscs-ci run
@rasolca I think using hipPointerGetAttributes
is working just fine. I was able to run a small set of benchmarks with the latest change that otherwise would fail with master
. I added a couple of assertions:
The enum contains different values depending on the rocm version. https://docs.amd.com/projects/HIP/en/docs-5.6.0/.doxygen/docBin/html/hip__runtime__api_8h.html#aea86e91d3cd65992d787b39b218435a3
For sure hipMemoryTypeUnregistered has to be considered as it represent system allocated memory. (For 5.6 I suppose that it works as for CUDA which returns an error).
https://docs.amd.com/projects/HIP/en/latest/doxygen/html/group___memory.html#ga7c3e8663feebb7be9fd3a1e5139bcefc seems to indicate that maybe even the newer versions return an error for unregistered memory, but it's not 100% clear from the description. I'll try it out and see what it returns.
@rasolca indeed 5.6.1 returns hipErrorInvalidValue
with a malloc
d pointer, 6.0.2 gives hipMemoryTypeUnregistered
.
We could:
whip::exception
and fall back to hipMemcpyDefault
on hipErrorInvalidValue
. hipMemcpy2D
will only fail with non-zero offsets on host pointers, and if you get one of those pointers with a buggy version of HIP, hipMemcpy2D
will anyway fail on the copy.hipMemoryTypeUnregistered
as hipMemoryTypeHost
, if we get it, in the versions of HIP that support it.hipMemcpyDefault
on the other memory types.Perhaps worth discussing in today's meeting.
cscs-ci run
cscs-ci run
cscs-ci run
Just a minor comment: wondering if this workaround shouldn't be moved to
whip
. Feel free to ignore this, we can always move it if we decide it's a good idea.
It's a very good point, but at least at this point I'm not confident enough in the workaround to put it in whip. I know it works for our use cases and I think it works for others, but I really haven't tested it extensively enough to put it into whip
right now.
2D memcpys involving host buffers with non-zero offsets are broken on HIP 5.6 onwards. Some cases have been fixed in https://github.com/ROCm/clr/commit/56daa6c4891b43ec233e9c63f755e3f7b45842b4 (included in 5.7.0) and further fixes in are in https://github.com/ROCm/clr/commit/d3bfb55d7a934355257a72fab538a0a634b43cad (not released as of this PR).
Where it's straightforward to do so I've simply replaced
whip::memcpy_default
with the explicit kind. Inlacpy
we don't know the direction anymore so I've added an additional parameter for the kind that has to be passed by the caller. Finally, I have not changed all instances ofwhip::memcpy_default
(e.g. in 1D memcpys) though I wouldn't oppose that at all as I find being explicit is rarely a bad thing.