Closed ramcherukuri closed 4 months ago
gfx10 and gfx11 Wavefront Size is 32 compared to gfx9, which is 64. It was causing the test_multi_kernel.py test to fail.
Fail case
> AMD_LOG_LEVEL=3 TORCH_LOGS="+dynamo" TORCHDYNAMO_VERBOSE=1 AOT_FX_GRAPHS=1 TORCH_COMPILE_DEBUG=1 PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_multi_kernel.py -k test_reduction_scratch_buffer_cpp_wrapper > > :3:hip_module.cpp :74 : 1406367010027 us: [pid:6817 tid:0x7ffa3ce5a740] hipModuleGetFunction ( 0x7ffc23ed3128, 0xb997850, triton_ ) > :3:hip_module.cpp :88 : 1406367010031 us: [pid:6817 tid:0x7ffa3ce5a740] hipModuleGetFunction: Returned hipSuccess : > :3:hip_module.cpp :434 : 1406367010037 us: [pid:6817 tid:0x7ffa3ce5a740] hipModuleLaunchKernel ( 0x0xc674930, 2, 1, 1, 128, 1, 1, 0, stream:<null>, 0x7ffc23ed31a0, char array:<null> ) > :1:hip_module.cpp :246 : 1406367010041 us: [pid:6817 tid:0x7ffa3ce5a740] Launch params (128, 1, 1) are larger than launch bounds (64) for kernel triton_ > :3:hip_module.cpp :454 : 1406367010043 us: [pid:6817 tid:0x7ffa3ce5a740] hipModuleLaunchKernel: Returned hipErrorLaunchFailure :
After the fix:
TORCH_LOGS="dynamo" TORCHDYNAMO_VERBOSE=1 AOT_FX_GRAPHS=1 TORCH_COMPILE_DEBUG=1 PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_multi_kernel.py -k tesion_scratch_buffer_cpp_wrapper Running tests... ---------------------------------------------------------------------- I0514 21:50:26.443000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255 I0514 21:50:26.456000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE) I0514 21:50:26.458000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor Compiled module path: /tmp/tmp54kbo900/vu/cvud6jexoz5fudwpxyzeswy34ysbnmjvaiupqaqj6ybvenklcpmz.py Compiled module path: /tmp/tmp54kbo900/nj/cnjxvy7nroopcld7a5okokbcak2onvobesljghb7ngbrnedvxwrt.py I0514 21:50:36.654000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards W0514 21:50:36.655000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__0_inference_0 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__0_inference_0.0 I0514 21:50:36.658000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor I0514 21:50:36.662000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards frames [('total', 1), ('ok', 1)] inline_call [] stats [('calls_captured', 6), ('unique_graphs', 1)] inductor [('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] .I0514 21:50:36.702000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255 I0514 21:50:36.706000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE) I0514 21:50:36.707000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor Compiled module path: /tmp/tmpdsuuxsj1/gn/cgnxnxnx56rckuyk6lonjippmjzmxzxy2grqfgwahn7gajnjcurs.py Compiled module path: /tmp/tmpdsuuxsj1/ws/cwsdkp4bpbgai5n3whg7df4lgxtj5su4gsm2antsewse5owvav6l.py I0514 21:50:45.301000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards W0514 21:50:45.302000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__1_inference_1 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__1_inference_1.1 I0514 21:50:45.305000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor I0514 21:50:45.308000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards frames [('total', 1), ('ok', 1)] inline_call [] stats [('calls_captured', 6), ('unique_graphs', 1)] inductor [('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] .I0514 21:50:45.315000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo start tracing f /var/lib/jenkins/pytorch/test/inductor/test_multi_kernel.py:255 I0514 21:50:45.321000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 1: torchdynamo done tracing f (RETURN_VALUE) I0514 21:50:45.322000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: calling compiler function inductor Compiled module path: /tmp/tmptbo1o6_j/lt/clt7zsa7bje3bl6lznq5smcogqrehzru66b3x2qkl2devls4fwgm.py Compiled module path: /tmp/tmptbo1o6_j/nr/cnrybx7jlstvz3ucw6tfobv7ktbzytydwsi2p5j7js5mn5hhkesh.py I0514 21:50:53.921000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards W0514 21:50:53.922000 139777040070464 torch/_inductor/debug.py:414] [0/0] model__2_inference_2 debug trace: /var/lib/jenkins/pytorch/torch_compile_debug/run_2024_05_14_21_50_26_443606-pid_13006/torcr/model__2_inference_2.2 I0514 21:50:53.924000 139777040070464 torch/_dynamo/logging.py:55] [0/0] Step 2: done compiler function inductor I0514 21:50:53.926000 139777040070464 torch/fx/experimental/symbolic_shapes.py:2936] [0/0] produce_guards frames [('total', 1), ('ok', 1)] inline_call [] stats [('calls_captured', 6), ('unique_graphs', 1)] inductor [('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] . ---------------------------------------------------------------------- Ran 3 tests in 27.892s OK
Are we going to upstream this as well?
gfx10 and gfx11 Wavefront Size is 32 compared to gfx9, which is 64. It was causing the test_multi_kernel.py test to fail.
Fail case
After the fix: