Closed DestroyGPU closed 1 month ago
This usually happens if you run it with GPGPUsim instead of in-hardware. Could you check if GPGPUsim env variable is set?
Yes, I unset the environment variable, and now it can run! Thank you!
Writing results to /workspace/accel-sim-framework/hw_run/traces/device-0/11.8/cutlass_perf_test_k1/__seed_2020___dist_0____m_2560___n_16___k_2560___kernels_sgemm____iterations_5___providers_cutlass/traces//kernel-1.trace
By the way, how long will it take to finish generating the trace for cutlass? It has been running for several minutes and is still running. Now it generates one trace, and the trace file is 1.76GB and is not growing larger. I suspect it is stuck somewhere.
And how long will it take to run the generated trace on gpu simulator?
Just judging by the size, a 1.76 GB trace should be on the quicker side, I've not run cutlass lately though. I've gotten 1/2 TB traces, with weeks/(months) long simulation time.
Thanks!
This usually happens if you run it with GPGPUsim instead of in-hardware. Could you check if GPGPUsim env variable is set?
Would it make sense for the tracer to detect if it's running with GPGPUsim during nvbit_at_init
and exit early with an error message if it is?
Something like:
...
void nvbit_at_init() {
// detect if the user has sourced the gpgpu-sim environment, if they have then
// we should exit early with an error message because the tool will not work
// with gpgpu-sim
if (std::getenv("GPGPUSIM_SETUP_ENVIRONMENT_WAS_RUN")) {
std::cerr << "Error: gpgpu-sim environment detected, this tool is not "
"compatible with gpgpu-sim"
<< std::endl;
exit(1);
}
...
Got the above error after running
./util/tracer_nvbit/run_hw_trace.py -B cutlass -D 0
. CUDA version 11.8 on A100.Could you please take a look at this?
Thanks in advance!