Closed jermainewang closed 5 years ago
I just fixed FromDLPack (https://github.com/VoVAllen/tf-dlpack/blob/master/src/from_dlpack_kernel_copy.cc#L67)
The problem is the output tensor has to be create with context->allocate_output
at here.
There's no problem inheriting TF's allocator, and it works well. But you cannot set the tensor created with new Tensor(allocator...)
as the output.
All context function can be found here. set_output
will copy the tensor(created with new Tensor), and set_output_ref
is not supported in tf 2.0.
Also I found pytorch will also copy the from_dlpack
tensor when doing further computation.
You can verify this with cupy's dlpack. Allocate a big enough tensor and send to torch with dlpack. You would find no copy is invoked when torch.utils.data.from_dlpack
. But when you print it or doing other operation on it(such as b=a+1), it will copy the tensor. (You can see memory doubled with nvidia-smi)
Currently I tested scope with https://github.com/VoVAllen/tf-dlpack/blob/master/test/test_to_dlpack.py#L5.
Also I found pytorch will also copy the
from_dlpack
tensor when doing further computation. You can verify this with cupy's dlpack. Allocate a big enough tensor and send to torch with dlpack. You would find no copy is invoked whentorch.utils.data.from_dlpack
. But when you print it or doing other operation on it(such as b=a+1), it will copy the tensor. (You can see memory doubleed with nvidia-smi)
Shouldn't the memory be doubled since b
is a new tensor?
No. if you do b=a+1, the memory triples.
That might be a problem. DGL uses DLPack a lot. Could you try locate the code to double check?
Update: I checked your implementation, it seems zero copy based on gpu memory usage. And I just change file at my side. Test file: https://github.com/VoVAllen/tf-dlpack/blob/master/test/test_zero_copy.py
Also I found way to avoid cudaMemcpy in to_dlpack
. Therefore we don't need cuda runtime anymore. The cuda dependency is removed.
But calling deleter in DeallocateRaw
still leads to core dump. I'm trying to figure out why.
C++ code is almost done. Could you help check whether the libpath in setup.py
works?
I'm working on test, lint and CI now.
But calling deleter in DeallocateRaw still leads to core dump. I'm trying to figure out why.
I forgot to set capsule's deconstructor. Should be fixed now.
test和lint都搞好了,就差CI了
Missing license file. Does TF use Apache 2.0?
Yes. I added the apache license to our repo.
@VoVAllen
Also, what's the problem of inheriting TF's allocator?