Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
http://www.oneflow.org
Apache License 2.0
5.9k stars 666 forks source link

[Feature Request] tensor.data_ptr() #6843

Open hiyyg opened 2 years ago

hiyyg commented 2 years ago

like that in torch, so that we can access the cuda memory address of a tensor in python.

MARD1NO commented 2 years ago

Can you give us an example to use this feature in PyTorch? I wonder in which situation you need to use tensor.data_ptr()

hiyyg commented 2 years ago

@MARD1NO For example:

https://github.com/NVlabs/DeepIM-PyTorch/blob/b46ccd2465ce69ac575bf454d4171a5dcb9c6908/ycb_render/ycb_renderer.py#L506

lixinqi commented 2 years ago

@MARD1NO For example:

https://github.com/NVlabs/DeepIM-PyTorch/blob/b46ccd2465ce69ac575bf454d4171a5dcb9c6908/ycb_render/ycb_renderer.py#L506

It seems like pytorch has no api called map_tensor. Did it implement by your team?

hiyyg commented 2 years ago

@MARD1NO For example: https://github.com/NVlabs/DeepIM-PyTorch/blob/b46ccd2465ce69ac575bf454d4171a5dcb9c6908/ycb_render/ycb_renderer.py#L506

It seems like pytorch has no api called map_tensor. Did it implement by your team?

@lixinqi map_tensor is not related to pytorch. It just need a pointer like tensor.data_ptr() in pytorch. But if oneflow can also give users such a pointer, it can be changed to oneflow.

lixinqi commented 2 years ago

@hiyyg Oneflow doesn't provide api tensor.data_ptr() right now for two reasons:

  1. The fundemental of design is very different from pytorch's. Memory of pytorch tensor are synchronously allocated in the same thread running python code, While memory of oneflow tensor are asynchronously allocated in a dependent worker thread. It's illegal to directly accesss memory of tensor in main thread.
  2. oneflow tensor can represent distributed tensor across nodes. For example t = oneflow.ones((32, 32), placement=flow.placement("cuda", {0:[0, 1]}), sbp=flow.sbp.split(0)), t is a distributed tensor (or consistent tensor) , the data of t[0:16, :] is located on device cuda 0, and the data of t[16:, :] is located on device cuda 1. Hence the behavior of t.data_ptr() is unable to be defined.

We will discuss this feature request in the near future.

hiyyg commented 2 years ago

OK. Really hope that oneday oneflow can TRULY replace pytorch.