DeepLink-org / deeplink.framework

BSD 3-Clause "New" or "Revised" License
59 stars 28 forks source link

fix is_scalar_on_cpu bug #866

Closed zhaoguochun1995 closed 3 months ago

zhaoguochun1995 commented 4 months ago

Avoid treating the tensor on the device as a CPU scalar tensor. syncStream will be called internally by tensor.item(), which is costly. If the tensor on the CPU has only one element and its shape is not empty, such as [1], [1,1], [1,1,...], it should be treated as a Scalar and the corresponding kernel function should be called. There are places in core where a scalar is wrapped but not marked as wrapped. see https://github.com/pytorch/pytorch/blob/8f70bf7a943799b5cd870952d39f36361de4b87f/torch/csrc/lazy/core/tensor.cpp#L386

zhaoguochun1995 commented 4 months ago

numel = 1 的标准和 pt 没对齐吧

If the tensor on the CPU has only one element and its shape is not empty, such as [1], [1,1], [1,1,...], it should be treated as a Scalar and the corresponding kernel function should be called 我们这里主要是为了调用scalar版本的kernel函数。 tensor(shape=[1], device='cpu'), 这种情况下应该当做scalar处理(如diopiAddScalar),而不是当做设备tensor去处理

而且并没有没对齐,pt 里面 numel 也是返回的1 而不是0