apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.72k stars 3.46k forks source link

[RFC] Use Object Protocol to Support runtime::Module and runtime::NDArray #4286

Closed tqchen closed 4 years ago

tqchen commented 4 years ago

The unified object protocol is now used by almost all of our runtime objects. However, there are still two exceptions: Module and NDArray. This RFC tries to discuss the possibility to unify them to the object protocol.

The advantage for unification is clear. After the unification, we can benefit from additional features in the object system. e.g. use ADT to directly store NDArrays. We can remove vm::Tensor which is created to overcome this limitation.

However, the change can also bring possible problems depends on how do we do it. We discuss the design choices in this RFC.

Considerations

The current PackedFunc C API treats NDArray and Module differently from Objects. We call NDArrayAlloc to allocate an Array and call NDArrayFree to release it. When we pass an NDArray to a PackedFunc, we set the type code to be kArrayHandle.

This explicit runtime API is simple to implement: we do not have to follow the explicit object protocol when implementing a minimum runtime. We can implement the NDArray in a way that does have to deal with ref counting deleter setup. Moreover, the explicit type_code makes RPC's implementation easy. As we do not have to deal with the second-level dispatching that reads the type index.

Another key property of the current NDArray::Container the compatibility of its pointer with DLTensor. We can directly take an NDArrayHandle allocated from C API and treat it as DLTensor. This property may no longer hold if we make NDArray::Container to be a sub-class of Object.

Finally, we need to think about how can we handle sub-classes of NDArray. Right now we introduce a second tag field in the NDArray::Container, we need to think about whether or not do we want to reuse Object's type hierarchy, or keep the old approach.

We discuss two options below in terms of calling convention.

Option1: Move to Object Calling Convention

The first option is to simply force NDArray and Module to use the new Object calling convention. That means that we no longer pass kNDArrayHandle in the type code and instead will pass kObjectHandle. We can also directly use ObjectFree to de-allocate the NDArray.

This option will force us to change the ABI of PackedFunc calling convention and needs major updates from all frontend runtimes. It will also complicate the implementation of RPC a bit, as we cannot exchange everything through RPC and need to specially handle Module and NDArray here. Finally, we will also break the compatibility of the NDArrayHandle with DLTensor*.

Option2: Use the New Object Protocol but Keep the Original Calling Convention

In this case, when we assign an ObjectRef to TVMArgs, we specially check if the reference points to NDArray or Module and set the type_code correctly to be kArrayHandle and kModuleHandle.

We further use a special argument passing convention for NDArray to pass the address of the DLTensor and recover the address(by arithmetics) when converting back to an NDArray. This allows us to the backward compatibility NDArrayHandle and DLTensor*. Although that does mean we can no longer use the ObjectFree to de-allocate NDArrays.

One potential drawback of this approach is the additional checks in the PackedFunc call when an ObjectRef is involved. Although they are very cheap, and we can reduce it further by avoid such checks when we know more static type information. This approach enjoys the backward compatibility of the ABI and no changes are needed in the frontends.

Current Proposal

We will go with option2 as it keeps everything backward compatible and is pre-req of option1 anyway. We could pursue option1 afterwards, if we decided that it will be helpful to unify all the logics.

tqchen commented 4 years ago

previous discussions here https://discuss.tvm.ai/t/discuss-use-object-protocol-for-module-ndarray/4583

junrushao commented 4 years ago

Is this RFC good to close since #4289 is done?

tqchen commented 4 years ago

https://github.com/apache/incubator-tvm/pull/4581 for the NDArray part