[RUNTIME] Memory Safety Issue on NDArrary::FromExternalDLTensor

tqchen commented 1 year ago

In https://github.com/apache/tvm/pull/11003 we introduced NDArray::FromExternalDLTensor, which allows us to create an NDArray from a DLTensor struct.

This method, however, is not safe in some cases, because the followup method can store the NDArray into data structures like, and there is no guarantee that the external DLTensor won't get freed before these data structure.

Array<NDArray> Test(DLTensor* dltensor, Array<NDArray> x) {
     auto nd = NDArrary::FromExternalDLTensor(*dl_tensor);
     x.push_back(nd);
     return x
}

In this case we don't know how long Array will get retained, and it can go beyond the lifecycle of the dltensor.

We already have a well-defined way to zero-exchange external memory safely. Through DLPack

Array<NDArray> Test(DLManagedTensor* dlpack, Array<NDArray> x) {
     auto nd = NDArrary::FromDLPack(dlpack);
     x.push_back(nd);
     return x
}

If we want to be able to exchange an external memory, we should do that through DLPack(which comes with DLManagedTensor*), and dlpack comes with proper deleter to be able to track such dependencies and retain memory such cases.

When we are chaining multiple models, in most cases such chaining can be done through natively allocated NDArray, if not possible having a way to construct such external memory into DLManagedTensor would be a safer way(then pass into the SetInput).

Leaving a note here to see if we want to revisit this API behavior, it might make sense to encourage DLPack and normal NDArray based zero copy.

cc @areusch

tqchen commented 1 year ago

cc @vvchernov @AndrewZhaoLuo @jwfromm

vvchernov commented 1 year ago

Hello @tqchen! You are correct in your description and yes, for TVM is better to go along scenario with DLPack to prevent memory safity issue. Nevertheless the method with nesting of NDArray on the external memory (for zero copying) is convenient for some cases. Particularly, recently TVM was embedded to ONNX Runtime (ORT). To avoid excess copying between ORT and TVM input/output tensors this method was implemented. Of course, it is advanced tool and assumes client responsibility for memory management. Another thing about which I thought is performance measurements for TVM without input/output copying time. I understand that these parts can be hidden in pipelining mechanism, but it is not developed well on TVM python front-end. Recently I have implemented mechanism to set external output memory to VM instead of allocated inside VM. It gives possibility to client to get output tensors where he wants and keeps them for as long as he need even if TVM session is released. I think this case clearly show why method like NDArrary::FromExternalDLTensor is needed. P.S. Possibly description of the method should be extended to warn the client about the responsibility when using it

tqchen commented 1 year ago

Thanks @vvchernov for the explaination. First of all, I am not questioning the need of zero copy, just how we implement it carefully.

The main concern about API itself is the possible mis-use. For example, while the zero-behavior in VM works fine because ORT retains the TVM session, it is not always the correct in other uses of VM.

Having developer to directly(and intentionally) create an DLManagedTensor, convert it to NDArray, then pass into SetupInput might be a better apporach here. So intention is clear

Thanks for pointing out the ORT integration need, can you point me to the related threads? One possiblity is to implement a ORTTensorToDLPack in that integration point, which specifically setup a DLManagedTensor. If there is no ref counting mechanism there, the worst case is thatw e can follow the same behavior as FromExternalDLTensor, but the surface area is minimized so there will be less possible misused.

vvchernov commented 1 year ago

Hello @tqchen! Sorry for late response I was busy enough. I saw it once more, agree with your arguments and have the following suggestion: 1. replace all using of NDArray::FromExternalDLTensor by NDArray::FromDLPack on TVM side. There are several places of the using only. 2a. Set deprecated tag for NDArray::FromExternalDLTensor and remove it on one of the future TVM release when ONNX Runtime code are improved with DLPack too. 2b. As alternative a warning can be added to description and printed in console for NDArray::FromExternalDLTensor to notify client about high-level responsibility related to memory safity.

In my view the best pair 1 + 2a. Potentially I could fix it on TVM side and after that on ORT side, but it will not be done quickly.

cc @AndrewZhaoLuo, @areusch

tqchen commented 1 year ago

Thank you. I think we do not need to do FromDLPack in the VM side, and can simply use the original NDArray and run fromDLPack from the users of the VM

apache / tvm

[RUNTIME] Memory Safety Issue on NDArrary::FromExternalDLTensor #13678