accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
289 stars 110 forks source link

undefined instruction F2FP #213

Closed mahmoodn closed 1 year ago

mahmoodn commented 1 year ago

Hello, On an RTX3080 machine, I see the following instruction

0300 ffffffff 1 R6 F2FP.PACK_AB 2 R255 R6 0

which results in the following simulation error

ERROR:  undefined instruction : F2FP.PACK_AB Opcode: F2FP

This is more related to nvbit, but I didn't get any answer there. The official manual mentioned UF2FP instruction. What do you think about that? Do you think they are the same?

William-An commented 1 year ago

Hi mahmoodn,

The UF2FP instruction is an instruction executed by the uniform unit inside the Ampere. The uniform unit is like a scalar CPU core that executes only one instruction on one data (unlike SIMD unit).

I believe the accel-sim tracer currently does not support uniform instruction at all (it will skip those as well as uniform registers), so it is quite likely a variant of the F2F instruction.

William-An commented 1 year ago

Actually there is an issue on this #39. But I don't think there is a PR yet. I will create a fix later to be merged into the mainstream.

mahmoodn commented 1 year ago

Thanks for the reply. If I understand correctly, the commits in ampere_opcode.h will only fix the unimplemented instruction error. This doesn't mean the software implementation of that instruction. In this case, ignoring the instruction in the tracer, act similarly. Am I right?

William-An commented 1 year ago

Well yes and no. Adding to ampere_opcode.h does not mean we will have the software implementation of the instruction (aka we won't functional simulate it), which is the case for all instructions in accel-sim. Since the accel-sim is trace-based, we don't care about the functional correctness of the kernel but only the kernel performance in the simulator. That is we won't get the calculation result of the kernel but we do get the timing statistics of the kernel.

The commits in the ISA define files (XXX_opcode.h) will map the F2FP same as F2F to get operated on FP units inside the GPU, so it is not ignoring the F2FP instruction. Though ignoring them might generate similar simulation results depending on the percentage of the instruction in the overall kernel.

mahmoodn commented 1 year ago

OK Thanks.