Open Wen-Tian-Pineapple opened 11 months ago
There is SEGF. You can run this in directly or in gdb to see which line caused the issue.
Thanks
Thanks, Currently I'm trying to debug it with gdb Also I wanted to mention that I did the same thing as #238, was trying to run all the things in the docker provided in this repo and was experiencing the similar segmentation fault when processing kernel-17.traceg.
Yea that's fine. If you can tell me the exact line which has SEGF I can provide some hints and help you narrow down the issue.
Yea that's fine. If you can tell me the exact line which has SEGF I can provide some hints and help you narrow down the issue.
Thanks for the reply, So below is the problem.
The segF happens with DEPBAR instruction and specifically when "Check for the case that the LDGSTSs monitored have finished when encountering the DEPBAR instruction" Maybe the new added LDGSTS Support is buggy?
@Connie120
The segF happens with DEPBAR instruction and specifically when "Check for the case that the LDGSTSs monitored have finished when encountering the DEPBAR instruction" Maybe the new added LDGSTS Support is buggy?
While using the parboil-sad workload, I encountered the same error in the _dev version_.
Thank you for the info.
Unfortunately we have a major conference deadline approaching and we won't be able to work on that soon.
You may look into it if you want, we are happy to accept any fix.
I suggest you checkout a commit right before the LDGSTS merge and use that for now.
Thanks!
@Wen-Tian-Pineapple What workload is this?
You are using TITANV config which does not have DEPBAR feature. So possibly we did not disable this correctly on old configs that do not have the feature.
Please use https://github.com/accel-sim/gpgpu-sim_distribution/tree/53e99da4d21eacbf103ba55bcc9cb6e05219cb91 and https://github.com/accel-sim/accel-sim-framework/tree/241762826c193e6589ea9959bd074d94c826bc15 instead
@tyhiwzm Which config are you using?
Thanks!
@Wen-Tian-Pineapple What workload is this?
You are using TITANV config which does not have DEPBAR feature. So possibly we did not disable this correctly on old configs that do not have the feature.
Please use https://github.com/accel-sim/gpgpu-sim_distribution/tree/53e99da4d21eacbf103ba55bcc9cb6e05219cb91 and https://github.com/accel-sim/accel-sim-framework/tree/241762826c193e6589ea9959bd074d94c826bc15 instead
@tyhiwzm Which config are you using?
Thanks!
Thanks for the reference, now it's working fine. BTW I'm using titanX/titanV config.
@Wen-Tian-Pineapple What workload is this?
You are using TITANV config which does not have DEPBAR feature. So possibly we did not disable this correctly on old configs that do not have the feature.
Please use https://github.com/accel-sim/gpgpu-sim_distribution/tree/53e99da4d21eacbf103ba55bcc9cb6e05219cb91 and https://github.com/accel-sim/accel-sim-framework/tree/241762826c193e6589ea9959bd074d94c826bc15 instead
@tyhiwzm Which config are you using?
Thanks!
Thank you for your reply. I'm using A100 config, just like https://github.com/accel-sim/accel-sim-framework/issues/138. The machine I am actually using is also an A100. I figured out what's causing the problem. The DEPBAR instruction shows up in the trace I captured, but there's no LDGSTS instruction. So that makes _m_warp[warp_id]->m_ldgdepbarbuf[i].size() in _shader_core_ctx::issuewarp throw an error, since _m_ldgdepbarbuf is empty.
BTW Would be great to know when this issue is fixed so I can download the newest version i dev branch.
When evaluating python program I ran into above error, the last line of gpgpu output is just thread block. I used PyInstaller to compile the python file and use the compiled file from "dist" directory, maybe that cause this problem? Also I'm using cuda11.01 and gcc 7.3.1 on centOS7 Anybody have any idea what the issue might be?