Closed 0charleschen0 closed 9 years ago
Just to make sure, which kfd version are you using ?
I am using kfd-1.0. Runtime is 1.0,too.
ok, thanks. That version should support local memory, at least from the kernel side.
Tried to recreate your steps (with the code you pasted here) I got: ogabbay@odedg-ubuntu:~/HSA-Runtime-AMD/sample$ ./vector_copy Initializing the hsa runtime succeeded. Calling hsa_iterate_agents succeeded. Checking if the GPU device is non-zero succeeded. Querying the device name succeeded. The device name is Spectre. Querying the device maximum queue size succeeded. The maximum queue size is 131072. Creating the queue succeeded. Segmentation fault (core dumped)
Hi,
Can you paste the hsail text. You can use the -hsail option in cloc.
Thanks, Shreyas
On Thu, Dec 11, 2014 at 6:12 AM, Oded Gabbay notifications@github.com wrote:
Tried to recreate your steps (with the code you pasted here) I got: ogabbay@odedg-ubuntu:~/HSA-Runtime-AMD/sample$ ./vector_copy Initializing the hsa runtime succeeded. Calling hsa_iterate_agents succeeded. Checking if the GPU device is non-zero succeeded. Querying the device name succeeded. The device name is Spectre. Querying the device maximum queue size succeeded. The maximum queue size is 131072. Creating the queue succeeded. Segmentation fault (core dumped)
— Reply to this email directly or view it on GitHub https://github.com/HSAFoundation/HSA-Runtime-AMD/issues/8#issuecomment-66610626 .
Hi, Shreyas, Here it is.
version 0:20140528:$full:$large;
extension "amd:gcn";
extension "IMAGE";
decl prog function &abort()();
prog kernel &__OpenCL_square_kernel(
kernarg_u64 %input,
kernarg_u64 %output,
kernarg_u64 %temp,
kernarg_u32 %count)
{
pragma "AMD RTI", "ARGSTART:__OpenCL_square_kernel";
pragma "AMD RTI", "version:3:1:104";
pragma "AMD RTI", "device:generic";
pragma "AMD RTI", "uniqueid:1024";
pragma "AMD RTI", "function:1:0";
pragma "AMD RTI", "memory:64bitABI";
pragma "AMD RTI", "uavid:8";
pragma "AMD RTI", "privateid:8";
pragma "AMD RTI", "ARGEND:__OpenCL_square_kernel";
@__OpenCL_square_kernel_entry:
// BB#0: // %entry
workitemabsid_u32 $s0, 0;
ld_kernarg_align(4)_width(all)_u32 $s1, [%count];
cmp_ge_b1_u32 $c0, $s0, $s1;
cbr_b1 $c0, @BB0_2;
// BB#1: // %if.then
ld_kernarg_align(8)_width(all)_u64 $d2, [%temp];
ld_kernarg_align(8)_width(all)_u64 $d0, [%output];
ld_kernarg_align(8)_width(all)_u64 $d1, [%input];
workitemid_u32 $s1, 0;
cvt_s64_s32 $d3, $s1;
shl_u64 $d3, $d3, 2;
add_u64 $d2, $d2, $d3;
cvt_s64_s32 $d3, $s0;
shl_u64 $d3, $d3, 2;
add_u64 $d1, $d1, $d3;
ld_global_f32 $s0, [$d1];
st_group_f32 $s0, [$d2];
add_u64 $d0, $d0, $d3;
mul_ftz_f32 $s0, $s0, $s0;
st_global_f32 $s0, [$d0];
@BB0_2:
// %if.end
ret;
};
gabbayo, I think maybe you forgot to compile the kernel code to .brig file. My program can run successfully(but with wrong result).
Thanks for your help, gabbayo and Shreyas.
Hi Charles,
A couple of things.
The HSAIL code generated here assumes that all pointers to local memory are offsets with base 0 (This will be different if FLAT addressing is used) . The offsets are in the range from 0 to size of Local Memory as defined in the device properties
In your example. change your code to
uint64_t size_of_local_temp = 1024 * 1024 4; //You probably need lesser- WORK_GROUP_SIZE 4 uint64_t static_local_size = hsaCodeDescriptor->workgroup_group_segment_byte_size; uint64_t temp_ptr = hsaCodeDescriptor->workgroup_group_segment_byte_size; uint64_t dynamic_local_size = size_of_local_temp; aql.group_segment_size= static_local_size + dynamic_local_size;
--Pass temp_ptr to the arguments--
This should work. Hope that helps.
Hi Shreyas,
It worked!!
Thank you for your help! I have struggled it for several days.
By the way, could HSA Foundation release example which uses dynamic local memory? I think others may want to know this. ^ _ ^ Thank you gabbayo.
Sincerely, Charles
Hi, I want to test the kernel with __local memory and run with HSA runtime. I modified the example vector_copy for testing. Below are what I modified : First, it is my kernel _vectorsquare.cl:
and it was compiled it by cloc to .brig file. Then, I changed the file_name from
to
kernel_name from
to
Then, I modified the dispatch information inspired by HSA-System-Runtime-Specification-Provisional-1.0 from
to
Finally, I added a variable for _local argument 'temp'_
and modified args structure like this :
and passed arguments like this :
Of course, print out the result for validity.
But, unfortunately, there is the result : Do I miss something need to do?
Below is the source code.