[RFC] Managed Tensor Data Structure

tqchen commented 6 years ago

Following https://github.com/dmlc/dlpack/pull/17

DLPack so far does not contain memory management for tensors and only asks users to pass non-managed DLTensor around, which serves our purposes so far. For each framework like PyTorch to ATen, and MXNet, there is a need for managing these tensors. This can be done in several ways:

1. Introducing destructor to a managed, as in https://github.com/dmlc/dlpack/pull/17
  - This is good for pure C library like Torch, do not have cross library C++ ABI issue.
  - Do not have runtime issue(allocator is implicit in destructor)
1. Make use C++'s shared_ptr management (one example in https://github.com/dmlc/dlpack/blob/master/contrib/dlpack/dlpackcpp.h#L21) MXNet uses this mechanism
  - Good for c++ libraries, if only shared_ptr is used
  - Do not have runtime issue (deleters can be encapsulated in shared_ptr)

The major question is as follows:

Is there motivation to share a managed tensor structure, or can we do it at per framework level
Since data management is usually coupled with runtime, is there any motivation to share some level of runtime(allocator, etc)

tqchen commented 6 years ago

@soumith @piiswrong @Yangqing @prigoyal

zdevito commented 6 years ago

Thanks for opening the RFC! We are excited about making it easier to mix and match different frameworks.

Let me provide some context for #17 and why there is motivation to share a managed tensor. We want to be able to share tensor resources between two frameworks (in this case Caffe2 and PyTorch, but with the option of supporting more). Sometimes to avoid extra copies a framework needs to hold onto a Tensor resource for an unknown amount of time. (e.g. in PyTorch Tensors are just ref-counted and can really end up anywhere in the Python state, so bounding their lifetime is not always feasible. Similar examples happen in Caffe2 as well). If we want to do this cooperation, then we would need something like a DLManagedTensor. We can do it pair-wise between frameworks, but this defeats the purpose of having a standard cross-framework in memory format.

The #17 PR tries to provide the minimum modification necessary to make this sharing possible and makes this management opt-in so that frameworks which cannot provide this functionality can indicate it via their type signature (DLTensor vs DLManagedTensor). I think this is favorable than more aggressive sharing of things like the allocator, because it allows us to prototype some amount of sharing and see if it actually gets used in practice. By doing this incrementally, we can see if there is continued demand for additional sharing of runtimes and make another incremental change if so.

We've also found that solutions like using shared_ptr management (ii) are tricky to coordinate: everyone has to agree on how it is done, otherwise we end up with incompatible C++ solutions, and sometimes custom deletion mechanisms are necessary. For instance in PyTorch, we need to release a handle to our at::Tensor type when the DLManagedTensor is released. The DLManagedTensor can easily encapsulate a std::shared_ptr in its void * ctx, and deleter functions if that is how a particular framework wants to manage its resources.

tqchen commented 6 years ago

std::shared_ptr does support custom deleter natively which encapsulates the deleter, thus have equivalent effect as the DLManagedTensor.

The advantage of doing it in terms of C ABI is that it is possibly cleaner if one end of the framework does not want to involve C++.

Pros of DLManagedTensor

The pointer is compatible with DLTensor pointer, so passing DLManagedTensor as DLTensor point could just work if the caller does not decide to retain ownership.

Drawbacks I can possibly see vs shared_ptr or ref-counting mechanism

It has to be owned by a single owner(who will call the destructor, see also following comments)
- Shared ownership is allowed from the caller's perspective as long as it manages deleter correctly(e.g. deleter decreases ref counter).
- The user who obtains the DLManagedTensor cannot further share it with others, unless it wraps it in its own ref counting mechanism

PENGUINLIONG commented 6 years ago

Inter-process memory sharing is always tricky. These fields added currently can only ensure transference of tensor if and only if the allocator framework stays alive. Sharing of tensors need reference counters, or other inter-process communication methods to ensure memory is deallocated properly. Actually, it cannot do even transference right, since the memory is allocated in address space of one process. When that process terminates, the memory is recycled by the OS. Following R/W to the memory will cause segfault.

Sharing means 'the joint use of a resource or space.' Say, if PyTorch cannot make sure the chunk of memory is alive during sharing, it means that PyTorch doesn't necessarily need the tensor; PyTorch actually need to safely transfer its ownership to another framework.

zdevito commented 6 years ago

This interface is not meant to transfer a tensor from one framework to another such that the original framework can forget about it. It is only meant to allow one framework to borrow a Tensor for an undetermined amount of time, and to indicate when it is done with the borrowing by calling the deleter. In this case reference counts are not needed because the original framework is responsible for keeping track of all still-live DLManagedTensors that refer to tensors inside the framework.

PENGUINLIONG commented 6 years ago

So, in that case. 'borrow' in Rust requires an instance to be either mutable for a single subject or immutable for multiple subjects. That rule can be applied to this also, so that race condition can be prevented. Some more description should be added.

Anyway. The name destructor is misleading, I suppose. Maybe dereference?

tqchen commented 6 years ago

I general I think the proposal is favorable because of the following properties, after explanation by @zdevito

The major advantages we see here are

The data structure itself is pointer compatible with DLTensor
- DLManagedTensor can be directly cast to DLTensor and used in the function call.
It is minimum change that only relies on C ABI

I have other minor comments in terms of naming:

typedef struct DLManagedTensor {
   //  just to be consistent with current Google C style
   DLTensor dl_tensor;
   // I feel it is better to distinguish it from DLTensor->ctx
   // Assuming we might have they on the same structure pointer even eventually.
   void * manager_ctx;
   // either destructor, deleter, finalizer are good name
   // destructor might be confused with C++ destructor, deleter seems to be 
   // a more netural name that is consistent with c++ shared_ptr deleter
   void (*deleter)(DLManagedTensor * self);
} DLManagedTensor;

I think we should also have a detailed comment on what is the intended use case of this data structure.

PENGUINLIONG commented 6 years ago

I think .NET programs can be benefited (because of its Dispose Pattern) in that way. :D

That's how I think it should be working like. BTW, should it be allowed to be written by any process during sharing?

default

zdevito commented 6 years ago

@tqchen: That naming seems reasonable to me. I also agree about adding a comment to explain the intended use. @prigoyal do you want to update the PR to incorporate this?

zdevito commented 6 years ago

@PENGUINLIONG Yes, that this a accurate view of what happens.

dmlc / dlpack

[RFC] Managed Tensor Data Structure #18