Cannot run the simulator with deep benchmarks using tensor core

accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.

https://accel-sim.github.io

Other

290 stars 110 forks source link

Cannot run the simulator with deep benchmarks using tensor core #169

Closed LinlinCH closed 1 year ago

LinlinCH commented 1 year ago

I compiled cutlass-bench and ran the simulator using PTX mode.

When I ran cutlass_perf_test, I came into:

_*cutlass_perf_test: cuda_api_object.h:82: void CUctx_st::add_ptxinfo(const char, const gpgpu_ptx_siminfo&): Assertion `s != NULL' failed. Aborted**

I also tested wmma using wmma_tests.h in https://github.com/gpgpu-sim/cutlass-gpgpu-sim.

Only wmma testing failed in the directory gemm-test.

So I guess the simulator failed due to wmma-related kernels. I chose QV100 as my configuration.

How can I run deep benchmarks using tensor core?

JRPan commented 1 year ago

Have you tried this one? https://github.com/accel-sim/gpu-app-collection/tree/release/src/cuda/cutlass-bench

Not sure what the error is. You may want to see if anyone else went through this. I would suspect some flags were not set during compile. The assertion error indicates that ptx info was not registered properly. Please try the cutlass linked above.

Or you can use trace-driven mode. We have QV100 cutlass traces posted. You can download the traces and run the simulation.

Thanks

LinlinCH commented 1 year ago

Thanks very much for your reply!

I have tried https://github.com/accel-sim/gpu-app-collection/tree/release/src/cuda/cutlass-bench.

I complied cutlass-bench for the usage of gpgpu-sim and got executable output _cutlass_perftest.

I used cmake flags -DUSE_GPGPUSIM=1 -DCUTLASS_NVCC_ARCHS=70. I followed the instructions according to README for GPGPU-Sim usage.

However, when I ran the simulator, I got _cutlass_perf_test: cuda_api_object.h:82: void CUctx_st::add_ptxinfo(const char, const gpgpu_ptx_siminfo&): Assertion `s != NULL' failed.

JRPan commented 1 year ago

What's your CUDA, gcc version? And what's the command you are running? let me try to re-create this error

JRPan commented 1 year ago

Meanwhile, I would highly recommend trace mode. Accel-Sim was created to solve these problems. May I ask is there any specific reason you prefer PTX over trace mode?

LinlinCH commented 1 year ago

I have tried 2 versions.

CUDA9.0 and gcc6.5.
CUDA11.0 and gcc7.5.

Thanks for your suggestion. I want to ask if I use trace mode, can I observe the behaviors of the simulator instruction by instruction?

JRPan commented 1 year ago

Yes. And it's more accurate. One thing different than PTX mode is trace mode uses SASS instead of PTX. For info on SASS check https://docs.nvidia.com/cuda/cuda-binary-utilities/#volta-instruction-set

I'll try re produce the issue

sxzhang1993 commented 1 year ago

Hello,

I can run cutlass with gpgpusim before, if you want to stick with PTX mode, this might be helpful,

https://github.com/accel-sim/gpu-app-collection/tree/release/src/cuda/cutlass-bench https://github.com/sxzhang1993/Run-cutlass-with-gpgpu-sim

LinlinCH commented 1 year ago

Hello,

I can run cutlass with gpgpusim before, if you want to stick with PTX mode, this might be helpful,

https://github.com/accel-sim/gpu-app-collection/tree/release/src/cuda/cutlass-bench https://github.com/sxzhang1993/Run-cutlass-with-gpgpu-sim

Thanks a lot for your reply!

It works well with gpgpu-sim running cutlass-bench!

It is weird when I use accel-sim with PTX mode, it still fails.

LinlinCH commented 1 year ago

Yes. And it's more accurate. One thing different than PTX mode is trace mode uses SASS instead of PTX. For info on SASS check https://docs.nvidia.com/cuda/cuda-binary-utilities/#volta-instruction-set

I'll try re produc

LinlinCH commented 1 year ago

Yes. And it's more accurate. One thing different than PTX mode is trace mode uses SASS instead of PTX. For info on SASS check https://docs.nvidia.com/cuda/cuda-binary-utilities/#volta-instruction-set I'll try re produc

Dear JRPan,

Now I am using SASS mode running deepbench.

I am wondering when I use the trace mode, can I know the values of operands? Since there is no input data and output results, can I observe the register values of each instruction?

Thanks!

JRPan commented 1 year ago

Sorry for the late reply. No, you can't. There is no functional simulation in trace mode. So it's impossible to see the register values.

masa-laboratory commented 1 year ago

@JRPan Does Accel-Sim support vISA-PTX execution-driven mode of PyTorch, which has functional simulation?

JRPan commented 1 year ago

Yes. That is the PTX mode.

William-An commented 1 year ago

Fixed in #134