NVlabs / NVBit

199 stars 18 forks source link

operand_t embeds a std::string field, but C++ does not preserve ABI, causing errors in record_reg_vals #58

Closed sree314 closed 2 years ago

sree314 commented 2 years ago

The record_reg_vals tool was not working for me with CUDA 10.2/Driver 450 and an SM_52 card for both 1.5.3 and 1.5.2 releases. It would crash with Illegal instruction errors and dmesg would show out of range register errors. Wouldn't work on other driver versions and a Pascal GPU either.

Turns out the value of op->reg.u.num was wildly incorrect. Since TOOL_VERBOSE=1 actually showed the correct number, I tracked it down to the presence of str in operand_t.

This is std::string but apparently the compiler that was used to compile nvbit.a has a different layout for std::string than gcc 7.3.1 that I'm using. This is to be expected -- C++ does not have stable ABIs.

As a workaround, replacing std::string str with uint64_t str[4] seems to work.

ovilla commented 2 years ago

Yes, we stumbled upon this issue in the past and we tried to avoid the use of std::string as much as possible in public interfaces and structs, however we forgot to remove this particular instance. We will update the next version of Nvbit to completely avoid usage of std::string in public interfaces and exposed structs. Thanks for pointing this out.