Open IvanaGyro opened 1 month ago
This function can be used in cases like: myunitensor.to(Device.cuda).other().yetanother() I think it’s ok to keep this?
Instead, we can use
my_unitensor = my_unitensor.to(Device.cuda).other_().yetanother_()
In the current implementation, to_
must allocate memory if moving to different devices, so the code above is not slower. Providing in-place to_
may mislead the user that to_
doesn't allocate new memory.
Moving data between different devices always allocates new memory. In-place version
to_(device)
doesn't prevent allocation. Python and C++ have the same syntax, with theto(device)
method, to release the memory used before transfer.Besides, managed memory is allocated when the device is GPU. CPU and GPUs can access managed memory, so it may not need to allocate any memory when switching devices if all data is stored in the managed memory. If all data is stored in the managed memory, the remaining
to(device)
can become an "in-place" method.