[ROCm] skip warp update to 64 for gfx10 and gfx11

gfx10 and gfx11 Wavefront Size is 32 compared to gfx9, which is 64. It was causing the test_multi_kernel.py test to fail.

Fail case

> AMD_LOG_LEVEL=3 TORCH_LOGS="+dynamo" TORCHDYNAMO_VERBOSE=1 AOT_FX_GRAPHS=1 TORCH_COMPILE_DEBUG=1 PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_multi_kernel.py -k test_reduction_scratch_buffer_cpp_wrapper 
> 
> :3:hip_module.cpp           :74  : 1406367010027 us: [pid:6817  tid:0x7ffa3ce5a740]  hipModuleGetFunction ( 0x7ffc23ed3128, 0xb997850, triton_ )
> :3:hip_module.cpp           :88  : 1406367010031 us: [pid:6817  tid:0x7ffa3ce5a740] hipModuleGetFunction: Returned hipSuccess :
> :3:hip_module.cpp           :434 : 1406367010037 us: [pid:6817  tid:0x7ffa3ce5a740]  hipModuleLaunchKernel ( 0x0xc674930, 2, 1, 1, 128, 1, 1, 0, stream:<null>, 0x7ffc23ed31a0, char array:<null> )
> :1:hip_module.cpp           :246 : 1406367010041 us: [pid:6817  tid:0x7ffa3ce5a740] Launch params (128, 1, 1) are larger than launch bounds (64) for kernel triton_
> :3:hip_module.cpp           :454 : 1406367010043 us: [pid:6817  tid:0x7ffa3ce5a740] hipModuleLaunchKernel: Returned hipErrorLaunchFailure :

After the fix:

TORCH_LOGS="dynamo" TORCHDYNAMO_VERBOSE=1 AOT_FX_GRAPHS=1 TORCH_COMPILE_DEBUG=1 PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_multi_kernel.py -k tesion_scratch_buffer_cpp_wrapper

Running tests...
----------------------------------------------------------------------
I0514 21:50:26.443000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255
I0514 21:50:26.456000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE)
I0514 21:50:26.458000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor
Compiled module path: /tmp/tmp54kbo900/vu/cvud6jexoz5fudwpxyzeswy34ysbnmjvaiupqaqj6ybvenklcpmz.py
Compiled module path: /tmp/tmp54kbo900/nj/cnjxvy7nroopcld7a5okokbcak2onvobesljghb7ngbrnedvxwrt.py
I0514 21:50:36.654000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
W0514 21:50:36.655000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__0_inference_0 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__0_inference_0.0
I0514 21:50:36.658000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor
I0514 21:50:36.662000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
frames [('total', 1), ('ok', 1)]
inline_call []
stats [('calls_captured', 6), ('unique_graphs', 1)]
inductor [('fxgraph_cache_miss', 1)]
aot_autograd [('total', 1), ('ok', 1)]
.I0514 21:50:36.702000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255
I0514 21:50:36.706000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE)
I0514 21:50:36.707000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor
Compiled module path: /tmp/tmpdsuuxsj1/gn/cgnxnxnx56rckuyk6lonjippmjzmxzxy2grqfgwahn7gajnjcurs.py
Compiled module path: /tmp/tmpdsuuxsj1/ws/cwsdkp4bpbgai5n3whg7df4lgxtj5su4gsm2antsewse5owvav6l.py
I0514 21:50:45.301000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
W0514 21:50:45.302000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__1_inference_1 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__1_inference_1.1
I0514 21:50:45.305000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor
I0514 21:50:45.308000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
frames [('total', 1), ('ok', 1)]
inline_call []
stats [('calls_captured', 6), ('unique_graphs', 1)]
inductor [('fxgraph_cache_miss', 1)]
aot_autograd [('total', 1), ('ok', 1)]
.I0514 21:50:45.315000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255
I0514 21:50:45.321000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE)
I0514 21:50:45.322000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor
Compiled module path: /tmp/tmptbo1o6_j/lt/clt7zsa7bje3bl6lznq5smcogqrehzru66b3x2qkl2devls4fwgm.py
Compiled module path: /tmp/tmptbo1o6_j/nr/cnrybx7jlstvz3ucw6tfobv7ktbzytydwsi2p5j7js5mn5hhkesh.py
I0514 21:50:53.921000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
W0514 21:50:53.922000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__2_inference_2 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__2_inference_2.2
I0514 21:50:53.924000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor
I0514 21:50:53.926000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards
frames [('total', 1), ('ok', 1)]
inline_call []
stats [('calls_captured', 6), ('unique_graphs', 1)]
inductor [('fxgraph_cache_miss', 1)]
aot_autograd [('total', 1), ('ok', 1)]
.
----------------------------------------------------------------------
Ran 3 tests in 27.892s

OK

ROCm / pytorch

[ROCm] skip warp update to 64 for gfx10 and gfx11 #1417