Closed mattmacy closed 8 years ago
Hi Matt, please try setting env var HIP_PLATFORM to "hcc" so hip will recognize the nanos.
On Mar 23, 2016, at 11:36 PM, Matthew Macy notifications@github.com<mailto:notifications@github.com> wrote:
I was able to do the tutorial on gpuopen.comhttp://gpuopen.com but found that hipGetDeviceCount was only returning 1 so the examples would only run on my primary GPU a GTX 980Ti. I also have an R9 Nano and an R9 Fury. The kfd driver exports 3 nodes under topology so the runtime should let me talk to them. I'm running Ubuntu 15. I was hoping to instrument hip_hcc.cpp to see what it was doing right here:
/*
But I can't even get it to compile: ~/devel/HIP2$ make ./bin/hipcc -I/opt/hcc/include -std=c++11 -I/opt/hsa/include src/hip_hcc.cpp -c -O3 -o src/hip_hcc.o src/hip_hcc.cpp:52:2: error: #error (USE_AM_TRACKER requries HCC version of 16074 or newer)
^ Died at ./bin/hipcc line 208. Makefile:20: recipe for target 'src/hip_hcc.o' failed make: *\ [src/hip_hcc.o] Error 1
I made the following change to the Makefile in response to complaints. But it's still not doing anything. And it looks like it's trying to compile the code with nvcc: mmacy@pandemonium:~/devel/HIP2$ hipcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015 Cuda compilation tools, release 7.5, V7.5.17
You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/issues/15
I see - that tells it which compiler to use.
hipcc square.cpp In file included from /home/mmacy/devel/HIP/src/hip_hcc.cpp:42: In file included from /home/mmacy/devel/HIP/include/hip_runtime.h:54: In file included from /home/mmacy/devel/HIP/include/hcc_detail/hip_runtime.h:41: In file included from /home/mmacy/devel/HIP/include/hip_runtime_api.h:196: /home/mmacy/devel/HIP/include/hcc_detail/hip_runtime_api.h:35:2: error: ("This version of HIP requires a newer version of HCC.");
^ /home/mmacy/devel/HIP/src/hip_hcc.cpp:2354:17: warning: unused variable 'stream' [-Wunused-variable] hipStream_t stream = ihipSyncAndResolveStream(hipStreamNull); ^ 1 warning and 1 error generated. remake-deps failed at /home/mmacy/devel/HIP/bin/hipcc line 179.
That doesn't work so well.
I've installed the most recent .deb from https://bitbucket.org/multicoreware/hcc/downloads. I take it I need to download the hcc sources as well?
I see. Their latest .deb is 16045. Your sources require 16074 or later.
I'm trying the following to see if I get a working hcc: https://bitbucket.org/multicoreware/hcc/wiki/Developer%20Information
Progress. I'm running 16124. It looks like you're out of sync with hsa:
mmacy@pandemonium:~/devel/hcc/build$ hcc --version
HCC clang version 3.5.0 (based on HCC 0.10.16124-89bbf6f-7e4cd9e LLVM 3.5.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix
mmacy@pandemonium:~/devel/hcc/build$ cd ../..
mmacy@pandemonium:~/devel$ cd HIP/samples/0_Intro/
bit_extract/ square/
mmacy@pandemonium:~/devel$ cd HIP/samples/0_Intro/square/
mmacy@pandemonium:~/devel/HIP/samples/0_Intro/square$ hipcc square.cpp
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2093:22: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, locked_srcp, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2155:35: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, _pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2208:39: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, srcp0, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2333:35: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
^~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2463:39: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
^~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
5 errors generated.
remake-deps failed at /home/mmacy/devel/HIP/bin/hipcc line 179.
I don't know what the situation is with the ROCR_V2 API. The async memcpy in what I assume is the canonical hsa_ext_amd.h: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h looks more like the old one.
I made the following changes to hip_hcc.cpp to get my square.cpp to compile using hcc as the HIP_PLATFORM:
index 57d55a1..776e7c6 100644
--- a/src/hip_hcc.cpp
+++ b/src/hip_hcc.cpp
@@ -2090,7 +2090,7 @@ void StagingBuffer::CopyHostToDevicePinInPlace(void* dst, const void* src, size_
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, locked_srcp, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status = hsa_amd_memory_async_copy(dstp, locked_srcp, theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
assert(0);
#endif
@@ -2152,7 +2152,7 @@ void StagingBuffer::CopyHostToDevice(void* dst, const void* src, size_t sizeByte
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, _pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _pinnedStagingBuffer[bufferIndex], theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _pinnedStagingBuffer[bufferIndex], theseBytes, _device->_hsa_agent, 0, NULL, _completion_signal[bufferIndex]);
#endif
@@ -2205,7 +2205,7 @@ void StagingBuffer::CopyDeviceToHost(void* dst, const void* src, size_t sizeByte
tprintf (TRACE_COPY2, "D2H: bytesRemaining0=%zu async_copy %zu bytes src:%p to staging:%p\n", bytesRemaining0, theseBytes, srcp0, _pinnedStagingBuffer[bufferIndex]);
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, srcp0, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], srcp0, theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], srcp0, theseBytes, _device->_hsa_agent, 0, NULL, _completion_signal[bufferIndex]);
#endif
@@ -2330,7 +2330,7 @@ void ihipSyncCopy(ihipStream_t *stream, void* dst, const void* src, size_t sizeB
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, 0, NULL, device->_copy_signal);
#endif
@@ -2460,7 +2460,7 @@ hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcp
tprintf (TRACE_SYNC, " copy-async, waitFor=%lu completion=#%lu(%lu)\n", depSignalCnt? depSignal.handle:0x0, ihip_signal->_sig_id, ihip_signal->_hsa_signal.handle);
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, 0, NULL, ihip_signal->_hsa_signal);
Hi Matthew, can you try switch to "dev" branch on both ROCK-Kernel-Driver and ROCR-Runtime? You shall be able to find newer async_copy API which works with HIP over there.
What do I do to just re-build the driver? Thanks.
And for that matter - how do I rebuild the runtime. There's no makefile in the root.
Hi Matthew, you don't need to build them. On "dev" branch of ROCK-Kernel-Driver you can find a "package" directory which has ubuntu & fedora packages inside. And you can also find pre-built packages under "package" directory in ROCR-Runtime. Please do remember to switch to "dev" branch on both repositories though.
OK. Great. Thanks. I'll do that in the morning and let you know how that goes. In the meantime the patched version works for me.
I do notice that AMD kernels are much slower than Nvidia kernels:
mmacy@pandemonium:~/devel/HIP/samples/0_Intro/square$ time !! time ./a.out deviceCount: 2 info: running on device Fiji info: allocate host mem ( 7.63 MB) info: allocate device mem ( 7.63 MB) info: copy Host2Device info: launch 'vector_square' kernel info: copy Device2Host info: check result PASSED!
real 0m1.203s user 0m0.088s sys 0m0.184s
mmacy@pandemonium:~/devel/HIP.old/samples/0_Intro/square$ time ./square.hip.out deviceCount: 1 info: running on device GeForce GTX 980 Ti info: allocate host mem ( 7.63 MB) info: allocate device mem ( 7.63 MB) info: copy Host2Device info: launch 'vector_square' kernel info: copy Device2Host info: check result PASSED!
real 0m0.273s user 0m0.028s sys 0m0.244s
Is that fundamental? Or does your job dispatch interface just need refinement?
Thanks.
Hi Matthew, there are many ongoing works to optimize all aspects of the stack. Please stay tuned. :)
OK. I updated both the kernel and the runtime to the 316 build. When running the square.cpp example with HIP_PLATFORM=hcc (nvcc still works fine) I now get a kernel oops:
Mar 24 11:28:34 pandemonium kernel: [ 639.895604] nvidia_uvm: Loaded the UVM driver, major device number 245
Mar 24 11:29:06 pandemonium kernel: [ 671.693636] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.693749] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.696239] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.734321] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776858] BUG: unable to handle kernel paging request at ffffc90019ecd000
Mar 24 11:29:06 pandemonium kernel: [ 671.776863] IP: [
Should I go back to the 1/25 version of driver/runtime with my local patch or is this likely to be fixed? I'm happy to provide more info if need be.
I created an issue in with ROCK as that is probably where the current problem belongs.
Hi, Make sure you install debian files for ROCK dev branch. For runtime, make sure to install ROCR dev branch. Test the sample in /opt/hsa/sample.
If your sample is not passing, ROCR or ROCK is not working as it should be.
If it pass, get compiler (HCC and LLVM), follow https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra. Make sure you run conformance test given in the wiki for the repo.
Then, add /opt/hsa to HSA_PATH, /opt/hcc to HCC_PATH. Do the same for adding bin directories to PATH and lib to LD_LIBRARY_PATH.
Get hip and add its project directory to HIP_PATH and hipcc directory to PATH.
See previous comment "OK. I updated both the kernel and the runtime to the 316 build." That's the dev kernel. I also installed the dev runtime so that hip_hcc.cpp will compile with the ROCR_V2 copy interface. And that is what is causing this panic.
My sample passed fine until I tried the latest kernel and runtime. So all the other options are correct.
Can you try running hsa sample?
I'm no longer able to boot the dev kernel. It also complains of not properly detecting my graphics hardware - so needs to run in low-resolution, but instead never displays a login prompt. I'm not sure what I need to do to recover at this point. The default ubuntu kernel still works OK.
Looking at the logs It seems I'm seeing further OOPS at boot now:
Mar 24 11:53:42 pandemonium rsyslogd: rsyslogd's userid changed to 104
Mar 24 11:53:43 pandemonium kernel: [ 13.184444] NVRM: Your system is not currently configured to drive a VGA console
Mar 24 11:53:43 pandemonium kernel: [ 13.184446] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
Mar 24 11:53:43 pandemonium kernel: [ 13.184447] NVRM: requires the use of a text-mode VGA console. Use of other console
Mar 24 11:53:43 pandemonium kernel: [ 13.184448] NVRM: drivers including, but not limited to, vesafb, may result in
Mar 24 11:53:43 pandemonium kernel: [ 13.184449] NVRM: corruption and stability problems, and is not supported.
Mar 24 11:53:43 pandemonium kernel: [ 13.312533] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Mar 24 11:53:43 pandemonium kernel: [ 13.312537] IP: [
Mar 24 11:53:48 pandemonium nvidia-persistenced: Started (1640)
Mar 24 11:53:49 pandemonium kernel: [ 19.067885] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Mar 24 11:53:49 pandemonium kernel: [ 19.067889] IP: [
And so on for all cpus.
Do you have the GTX 980Ti, the R9 Nano and the R9 Fury all installed in the same system? If so, did you install the drivers for the GTX card before or after you installed the ROCK packages?
They're all in the same system. I installed the GTX card a couple of weeks ago. The R9s date back to yesterday. I have made no changes to the Nvidia software/hardware configuration in a couple of weeks - i.e. well before doing anything with AMD.
The current status AFAICT is that the development driver won't work except in console-mode because Xorg's probing causes it to crash. So can anyone give me an ETA on when that will be fixed on github?
Thanks.
Hi, You can revert back to a previous release commit.
It's not clear to me where the problem was introduced. Can you hazard a guess at which changeset to try? The last time packages were updated was Jan 26th which corresponds to what's in master. So I'll need to build my own kernel - which is fine with me provided Kconfig is complete.
Hi, You can try master branch package. obsidian 62 (if you want to run hcc badly).
Closing this since the original issue should be occurring anymore.
@mattmacy Please try with a clean setup and reopen the issue if you face any problems.
I was able to do the tutorial on gpuopen.com but found that hipGetDeviceCount was only returning 1 so the examples would only run on my primary GPU a GTX 980Ti. I also have an R9 Nano and an R9 Fury. The kfd driver exports 3 nodes under topology so the runtime should let me talk to them. I'm running Ubuntu 15. I was hoping to instrument hip_hcc.cpp to see what it was doing right here:
But I can't even get it to compile: ~/devel/HIP2$ make ./bin/hipcc -I/opt/hcc/include -std=c++11 -I/opt/hsa/include src/hip_hcc.cpp -c -O3 -o src/hip_hcc.o src/hip_hcc.cpp:52:2: error: #error (USE_AM_TRACKER requries HCC version of 16074 or newer)
error (USE_AM_TRACKER requries HCC version of 16074 or newer)
^ Died at ./bin/hipcc line 208. Makefile:20: recipe for target 'src/hip_hcc.o' failed make: *\ [src/hip_hcc.o] Error 1
I made the following change to the Makefile in response to complaints. But it's still not doing anything. And it looks like it's trying to compile the code with nvcc: mmacy@pandemonium:~/devel/HIP2$ hipcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015 Cuda compilation tools, release 7.5, V7.5.17