Closed gFrancoCamilo closed 2 months ago
Interesting.
Looks like gpgpu-sim actually don't support shf
at all. Not found in ptx.I.
It needs to be implemented.
I made the changes and it solved the shf
instruction problem, but now I'm facing the following problem:
GPGPU-Sim PTX: 8 (potential) branch divergence @ PC=0x8918 (app.1.sm_52.ptx:5239) @%p7 bra $L__BB9_12;
GPGPU-Sim PTX: immediate post dominator @ PC=0x8a10 (app.1.sm_52.ptx:5317) ret;
GPGPU-Sim PTX: ... end of reconvergence points for _Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh
GPGPU-Sim PTX: ... done pre-decoding instructions for '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'.
GPGPU-Sim PTX: pushing kernel '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh' to stream 0, gridDim= (1024,1,1) blockDim = (1024,1,1)
GPGPU-Sim uArch: Shader 0 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: CTA/core = 2, limited by: threads shmem regs
GPGPU-Sim: Reconfigure L1 cache to 32KB
...
GPGPU-Sim uArch: Shader 64 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 65 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 66 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 67 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 69 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 70 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 71 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 73 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 74 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 75 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 77 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 78 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 7 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 3 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 11 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 15 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 19 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 23 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 27 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 31 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 35 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 39 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 43 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 47 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 51 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 55 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 59 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 63 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z73counterWithOneTableExtendedSharedMemoryBytePermPartlyExtendedSBoxCihangirPjS_S_S_PyPh'
./justrun.sh: line 1: 17008 Aborted (core dumped) /root/accel-sim-framework/gpu-app-collection/src/..//bin/11.7/release/app
Do you have an idea of what might be causing this or how to solve it?
As I said, the code works when I use the libcudart.so
in /usr/local/cuda/lib64
.
When you changed cudaMallocManaged into cudaMalloc, did you also change all the times the CPU tried to directly change a device pointer?
I made a mistake in the original post. I replaced the cudaMallocManaged
calls for cudaMallocHost
instead of cudaMalloc
.
I used gdb to figure out what was causing the error from my previous comment. Apparently, the code was getting stuck in some printf calls and would abort.
GPGPU-Sim PTX: finding reconvergence points for 'vprintf'...
threadIndex : 1048575
GPGPU-Sim PTX: PDOM analysis already done for vprintf
Thread 2 "app" received signal SIGABRT, Aborted.
[Switching to Thread 0x7efd06fe7640 (LWP 34024)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=139625209165376) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139625209165376) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139625209165376) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139625209165376, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007efd07234476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007efd0721a7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007efd07836fa7 in my_cuda_printf (fmtstr=fmtstr@entry=0x7efce3943a50 "Plaintext : %08x %08x %08x %08x\n", arg_list=arg_list@entry=0x7efce91a5100 "\250\366C2\215\060Z\210") at cuda_device_printf.cc:56
#6 0x00007efd07837259 in gpgpusim_cuda_vprintf (pI=0x55968e84a790, thread=0x7efcfeaca9a0, target_func=<optimized out>) at cuda_device_printf.cc:115
#7 0x00007efd0783d076 in call_impl (pI=pI@entry=0x55968e84a790, thread=thread@entry=0x7efcfeaca9a0) at instructions.cc:2184
#8 0x00007efd07827bc7 in ptx_thread_info::ptx_exec_inst (this=0x7efcfeaca9a0, inst=..., lane_id=lane_id@entry=31) at /root/accel-sim-framework/gpu-simulator/gpgpu-sim/src/cuda-sim/opcodes.def:58
#9 0x00007efd0799d772 in core_t::execute_warp_inst_t (this=this@entry=0x55968005c5a0, inst=..., warpId=63, warpId@entry=4294967295) at abstract_hardware_model.cc:1192
#10 0x00007efd078cf59f in exec_shader_core_ctx::func_exec_inst (inst=..., this=0x55968005c5a0) at shader.cc:1023
#11 shader_core_ctx::issue_warp (this=0x55968005c5a0, pipe_reg_set=..., next_inst=0x55968e84a790, active_mask=std::bitset = {...}, warp_id=63, sch_id=<optimized out>) at shader.cc:1046
#12 0x00007efd078c9760 in scheduler_unit::cycle (this=0x5596800fba80) at shader.cc:1408
#13 0x00007efd078c9cf6 in shader_core_ctx::issue (this=0x55968005c5a0) at shader.cc:1118
#14 0x00007efd078d9f46 in shader_core_ctx::cycle (this=0x55968005c5a0) at shader.cc:3627
#15 0x00007efd078d9fc0 in simt_core_cluster::core_cycle (this=0x55968005c520) at shader.cc:4437
#16 0x00007efd0789bc23 in gpgpu_sim::cycle (this=0x55967dc8a280) at gpu-sim.cc:1957
#17 0x00007efd079a5615 in gpgpu_sim_thread_concurrent (ctx_ptr=0x55967dc71d70) at gpgpusim_entrypoint.cc:127
#18 0x00007efd07286ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#19 0x00007efd07317a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
The backtrace points to this if statement in the cuda_device_printf.cc file, which fails to run the line printf("Plaintext : %08x %08x %08x %08x\n", pt0Init, pt1Init, pt2Init, pt3Init);
.
I commented the printf statements, which resolved the warnings and fixed the problem.
Thank you for your help!
Hello! I'm trying to run this AES implementation using the Accel-Sim framework in PTX mode. I have followed the instructions to add it as an app described here.
However, when I run the application, I get the following error.
I saw that a similar error was posted on #4 but I have verified if I was running the most recent GPGPU-sim version, and it looks like I am (GPGPU-Sim Simulator Version 4.2.0 [build gpgpu-sim_git-commit-7dc99771_modified_0.0]). Do you have any suggestions for solving or circumventing this problem?
Some additional information:
nvcc -o app AES_final.cu -lcudart -I. -I/usr/local/cuda/include
. The environmental variable forLD_LIBRARY_PATH
is set to/root/accel-sim-framework/gpu-simulator/gpgpu-sim/lib/gcc-/cuda-11070/release:
. I also tried generating the executable using the options-gencode=arch=compute_50,code=compute_50
and-gencode=arch=compute_70,code=compute_70
but the result is the same.cudaMallocManaged
on the original code forcudaMalloc
, since libcudart.so included in gpgpusim does not supportcudaMallocManaged
(I checked this by runningnm -D /root/accel-sim-framework/gpu-simulator/gpgpu-sim/lib/gcc-/cuda-11070/release/libcudart.so | grep cudaMalloc
). When I run the code with this modification and theLD_LIBRARY_PATH
set to/usr/local/cuda/lib64
, it works fine (directly in CLI, outside the framework). Without this change, I was getting a lookup error for cudaMallocManaged.Thank you!