NVlabs / NVBit

200 stars 18 forks source link

Instruction 'vote' without '.sync' is not supported #7

Closed mahmoodn closed 4 years ago

mahmoodn commented 4 years ago

I have changed the arch number to 70 in tools/*/Makefile and here is the error that I see

ptxas warning : For profile sm_70 adjusting per thread register count of 16 to lower bound of 24
ptxas /tmp/tmpxft_00001c5e_00000000-5_opcode_hist.ptx, line 6401; error   : Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
ptxas /tmp/tmpxft_00001c5e_00000000-5_opcode_hist.ptx, line 6408; error   : Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
ovilla commented 4 years ago

There is no need to change arch number to 70 in tools/*/Makefile , but if you want to do that then change the lines in the opcode_hist.cu that use __ballot in __ballot_sync.

But again there is no need to change arch number to 70 in tools/*/Makefile to have the tool working on Volta, since the tools are compiled with embedded PTX and the driver will JIT compile for Volta on the fly.

mahmoodn commented 4 years ago

Excuse me, these

    /* all the active threads will compute the active mask */
    const int active_mask = __ballot_sync(1);
    /* compute the predicate mask */
    const int predicate_mask = __ballot_sync(predicate);

have the following errors:

opcode_hist.cu(86): error: too few arguments in function call

opcode_hist.cu(88): error: too few arguments in function call

I accept that changing sm number is not necessary. Maybe this should be considered for future releases. Thanks for the effort. It is a great tool.

ovilla commented 4 years ago

Future releases of NVBit will automatically use the correct __ballot or __ballot_sync depending on the version of nvcc used (the reason of this "issue" is that __ballot is going to be deprecated soon in favor of __ballot_sync).

In the meanwhile you can use.

__ballot_sync(__activemask(), 1)

and

__ballot_sync(__activemask(), predicate)

respectively.

The function __ballot_sync takes 2 arguments, see a nice explanation of them at https://devblogs.nvidia.com/using-cuda-warp-level-primitives/