Closed FabianSchuetze closed 10 months ago
Thanks for this wonderful repo.
I have a question about the async copies:
uint32_t A_smem_lane_addr = __cvta_generic_to_shared(&smem[A_smem_idx][0]) + (lane_id % CHUNK_COPY_LINE_LANES) * THREAD_COPY_BYTES; CP_ASYNC_CG(A_smem_lane_addr, A_lane_ptr, THREAD_COPY_BYTES);
Does this mean that every lane (thread) has a different pointer to the shared memory and a different pointer to the global memory?
The way I understand the async copies, the src and dst pointers must be the same for every thread in the thread block. See the docs.
You should refer to the docs.
Thanks.
Thanks for this wonderful repo.
I have a question about the async copies:
Does this mean that every lane (thread) has a different pointer to the shared memory and a different pointer to the global memory?
The way I understand the async copies, the src and dst pointers must be the same for every thread in the thread block. See the docs.