Closed gdb closed 3 months ago
cc: @crcrpar are the following out of date:
https://github.com/NVIDIA/apex/blob/b3bd26a8004007e6d2d098934e063b952cab86f1/csrc/multi_tensor_apply.cuh#L15-L17
I see the same limits in PyTorch where you already updated to use int64_t
in https://github.com/pytorch/pytorch/pull/101760. Otherwise, I would expect that changing to use int64_t
increases the TensorListMetadata
struct size and hence the kernel arg size.
(Though, it seems that CUDA 12.1 on Volta+ increased the kernel arg size limit from 4 KB to 32 KB.)
I would expect that changing to use
int64_t
increases theTensorListMetadata
struct size and hence the kernel arg size.
Yes, but apex does not have multi-tensor-apply with a list of scalars so we might be able to dodge a tweak of depth_to_max_tensors and depth_to_max_blocks
Currently, multi_tensor_apply causes an illegal memory access due to an overflow in the
sizes
field ofTensorListMetadata
. This can be reproduced using the following standalone script: