VoVAllen / tf-dlpack

DLPack for Tensorflow
Apache License 2.0
36 stars 4 forks source link

Several TODO tasks #1

Closed jermainewang closed 4 years ago

jermainewang commented 4 years ago

@VoVAllen

Also, what's the problem of inheriting TF's allocator?

VoVAllen commented 4 years ago

I just fixed FromDLPack (https://github.com/VoVAllen/tf-dlpack/blob/master/src/from_dlpack_kernel_copy.cc#L67)

The problem is the output tensor has to be create with context->allocate_output at here. There's no problem inheriting TF's allocator, and it works well. But you cannot set the tensor created with new Tensor(allocator...) as the output. All context function can be found here. set_output will copy the tensor(created with new Tensor), and set_output_ref is not supported in tf 2.0.

VoVAllen commented 4 years ago

Also I found pytorch will also copy the from_dlpack tensor when doing further computation. You can verify this with cupy's dlpack. Allocate a big enough tensor and send to torch with dlpack. You would find no copy is invoked when torch.utils.data.from_dlpack. But when you print it or doing other operation on it(such as b=a+1), it will copy the tensor. (You can see memory doubled with nvidia-smi)

VoVAllen commented 4 years ago

Currently I tested scope with https://github.com/VoVAllen/tf-dlpack/blob/master/test/test_to_dlpack.py#L5.

jermainewang commented 4 years ago

Also I found pytorch will also copy the from_dlpack tensor when doing further computation. You can verify this with cupy's dlpack. Allocate a big enough tensor and send to torch with dlpack. You would find no copy is invoked when torch.utils.data.from_dlpack. But when you print it or doing other operation on it(such as b=a+1), it will copy the tensor. (You can see memory doubleed with nvidia-smi)

Shouldn't the memory be doubled since b is a new tensor?

VoVAllen commented 4 years ago

No. if you do b=a+1, the memory triples.

jermainewang commented 4 years ago

That might be a problem. DGL uses DLPack a lot. Could you try locate the code to double check?

VoVAllen commented 4 years ago

Update: I checked your implementation, it seems zero copy based on gpu memory usage. And I just change file at my side. Test file: https://github.com/VoVAllen/tf-dlpack/blob/master/test/test_zero_copy.py

Also I found way to avoid cudaMemcpy in to_dlpack. Therefore we don't need cuda runtime anymore. The cuda dependency is removed.

But calling deleter in DeallocateRaw still leads to core dump. I'm trying to figure out why.

C++ code is almost done. Could you help check whether the libpath in setup.py works?

VoVAllen commented 4 years ago

I'm working on test, lint and CI now.

VoVAllen commented 4 years ago

But calling deleter in DeallocateRaw still leads to core dump. I'm trying to figure out why.

I forgot to set capsule's deconstructor. Should be fixed now.

VoVAllen commented 4 years ago

test和lint都搞好了,就差CI了

jermainewang commented 4 years ago

Missing license file. Does TF use Apache 2.0?

VoVAllen commented 4 years ago

Yes. I added the apache license to our repo.