In this submission, we used code generation to handle scatter & gather with different ranks. Performance testing results show that this change brings the performance of these two operators up to a level comparable to Torch :)
Some of the performance test results are as follows:
If this branch can be merged, we can later use scatter for implementing select_scatter and slice_scatter, and gather for index_select and nll_loss. These implementations have been validated on a private development branch, demonstrating performance improvements that approach those of Torch, unlike the previously poor results.
In this submission, we used code generation to handle
scatter
&gather
with different ranks. Performance testing results show that this change brings the performance of these two operators up to a level comparable to Torch :)Some of the performance test results are as follows:
scatter_perf
gather_perf
If this branch can be merged, we can later use
scatter
for implementingselect_scatter
andslice_scatter
, andgather
forindex_select
andnll_loss
. These implementations have been validated on a private development branch, demonstrating performance improvements that approach those of Torch, unlike the previously poor results.Any advice is welcome.