Closed jpli02 closed 3 weeks ago
Atomic operation with vector type:
atom{.sem}{.scope}{.global}.add{.level::cache_hint}.vec_32_bit.f32 d, [a], b{, cache-policy};
atom{.sem}{.scope}{.global}.op.noftz{.level::cache_hint}.vec_16_bit.half_word_type d, [a], b{, cache-policy};
atom{.sem}{.scope}{.global}.op.noftz{.level::cache_hint}.vec_32_bit.packed_type d, [a], b{, cache-policy};
.sem = { .relaxed, .acquire, .release, .acq_rel };
.scope = { .cta, .cluster, .gpu, .sys };
.op = { .add, .min, .max };
.half_word_type = { .f16, .bf16 };
.packed_type = { .f16x2, .bf16x2 };
.vec_16_bit = { .v2, .v4, .v8 }
.vec_32_bit = { .v2, .v4 };
.level::cache_hint = { .L2::cache_hint }
Hi,
I am trying to use cute tensor to save the sum of different row blocks of a Tensor. Different thread may write to same location when performing accumulated sum.
Given cute tensor A and B(both are cute::half_t type), we want to accumulate the sum on A. As the following code shows, I want to sum up A, how to avoid race condition?
Is there an atomicAdd supported for cute::half_t, I cannot use atomicAdd since it doesn't support half_t.
Thank you so much!