Open IvanaGyro opened 5 days ago
Looks good to me -- some minor points:
StorageImplementation
doesn't need to store the dtype, since that is just the index of T
in the enumeration, and it certainly shouldn't be stored as an int that could get accidentally changed. In my sketch in #496 I have a constexpr
templated variable, so StorageImplementation<T, Allocator>::dtype()
could be implemented simply as { return dtype_index<T>; }
.
Where the data is stored (CPU, GPU etc) is effectively part of the type of StorageImplementation
via the Allocator
parameter, but only indirectly. It might be better to make this more explicit and extensible, with the idea that other storage devices might be added in the future, eg it might be 'stored' on another node of an MPI calculation, or on disk.
Storage_base
should definitely be a template, even if nothing else is. Replace the void*
with a T*
, or better still, a templated type. Something to consider: is it possible to add a storage type that isn't directly accessible via a pointer? The Thrust library uses device_ptr<T>
, which is what ought to be used to denote the underlying storage. You really shouldn't reinterpret_cast
between things like a device_ptr<T>
and void*
(do they even have the same size?!)
I don't understand the Storage
class in Option 2, it contains a std::unique_ptr<Storage>
, i.e. a pointer to its own type? How does that work? Is the intent that Storage
contains a std::unique_ptr<StorageBase>
? In that case, Storage
itself is effectively a wrapper around the pointer to StorageBase
, and shouldn't inherit from StorageBase
itself.
Similarly, Storage_base
contains intrusive_ptr<Storage_base> _impl;
, which I think cannot be intended.
Thank @ianmccul for commenting on this proposal.
std::vector
to store data, the type of allocator must be decided at the compilation time. Currently, the memory accessed by GPU is allocated with Unified Memory Access(UMA) feature. CUDA handles the transfer between CPU and GPU. So that we only have to assigned an allocator designed for CUDA at the compilation time to support both CPU and GPU computation. If we want to decide the type of allocator at runtime, the implementation will be more complex. It can be a feature in the future.Storage_base
is the current implementation. It will be dropped.
class StorageBase
andclass TensorBase
Introduction
Per off-line (not on GitHub) discussions on 2024.10.03 and 2024.11.06, some chapters of this proposal are removed from the original unpublished draft. Only the chapters that got consensus remained.
This proposal outlines a plan for refactoring the
Storage
andTensor
classes, along with their respective implementation classes. The goal of this refactoring is to enhance the efficiency of the program and minimize human errors. Please note that this proposal is not finalized and does not constitute a definitive list of items to be implemented. We encourage discussions regarding any details of the proposed refactoring plan.Importantly, this proposal DOES NOT alter the APIs for Python.
Advantages
Refactoring takes time. There is no need to refactor the code if there is no advantage. Here are some advantages we will have if we apply the plan mentioned in the following sections.
boost::intrusive_ptr
Dependencies: Currently, we utilize two components from Boost:intrusive_ptr
andunordered_map
. By replacingboost::unordered_map
withstd::unordered_map
, we can eliminate our dependency on Boost.Storage
class, we can lower the likelihood of contributors inadvertently accessing memory incorrectly.UniTensor
class to PyTorch's API. This proposal aims to decouple them.Class Diagrams
Below are class diagrams illustrating the current and proposed designs. Both diagrams omit specifiers such as
const
,virtual
, etc., for clarity.And here is the class diagram of the new design. The diagram only shows the stuff changed. The levels above from
class UniTensor
are not affected. Every interface only shows the methods to be implemented for bridging other backends.Proposed Changes
Templatization of Implementations of Storage and Tensor
The current design repeatedly defines implementations for
Storage
, resulting in numerous duplicated functions. Additionally,Tensor_impl
lacks type information, preventing compile-time type checking and leading to runtime checks that can degrade performance.The new design templatizes the implementations of Storage and Tensor to enable type checking at the compile time. The compiler can also help to determine if one type can convert to the other type. The new design also makes the containers for CPU and the containers for GPU different types. By separating them, we can easily use the STL container, like
std::vector
to maintain the memory. And we only open the door of usingthrust::vector
to maintain the memory both on CPU and GPU.Encapsulation of All Classes
Using
class Storage_base
as an example:Makes member variables public increasing the risk of illegally accessing the memory.
Besides, both Google Style Guide and C++ Core Guidelines prefer private member variables.
Here is a change for
class Storage_base
.The access control of the member variables changed or removed in the class diagram of the new design will be updated.
Creation of Interfaces:
class StorageBase
andclass TensorBase
The introduction of these interfaces allows for a cleaner separation between interface definitions and their implementations, enhancing flexibility for future integrations with other libraries. There are two options for implementation.
Added on 2024.11.08: This proposal requests the developers to implement the type-less
StorageBase
andTensorBase
while integrating with other libraries. However, by following with #496, it may be also possible to request the develops to implement typedStorageBase<typename>
andTensorBase<typename>
.Option 1 (Not Selected)
This option proposed making existing classes base classes for implementations from other libraries but presented several issues including forward declaration requirements and mixed responsibilities.
There are three problems with this option.
class CustomUint64Storage
, contains the unused variableimplementation_
which wastes memory.Option 2 (Selected)
The selected option introduces abstract interfaces (
class StorageBase
andclass TensorBase
) that must be implemented by any concrete storage or tensor class:Except for the public member functions this proposal appeals to remove, the public member functions of
class StorageBase
andclass TensorBase
will copy from the currentclass Storage
andclass Tensor
respectively. Below are the pure virtual functions ofclass StorageBase
andclass TensorBase
.These pure virtual functions must be overridden by the implementation.There is no function for getting
Storage
from TensorBase because we don't expect other libraries to have the component equivalent toStorage
. With this assumption, it means we have to change the arguments of the functions consumingclass Storage
to the iterators and the raw pointers.The reason for adding
HasContiguousIterator()
is to support the tensor implementation which stores the data in a multi-dimensional data structure. We may consider providing alternative and less efficient algorithms, likeExp()
, for that kind of tensor implementation. If we only want to support the tensor implementations storing data in a one-dimensional array, we should consider replacing the wholeStorage
component withvector<T>
because the functionality ofStorage
is almost equal to the functionality ofvector
.Discussion on Removal of Types
To facilitate type elimination in the C++ API, it is crucial to retain the clone() functions, as well as the class StorageBase, class TensorBase, class Storage, and class Tensor. If we only remove types during the Python binding process, these components will no longer be necessary, resulting in a cleaner and more straightforward codebase.
The usage of
clone()
will be further discussed in the next section.Blind Storage from Shapes
Another area for improvement is how
Storage
interacts with shapes. TheStorage
class currently relies on shape information that is passed to its member functions by the caller, which can lead to implicit coupling between storage management and shape handling.To foster better separation of concerns, we propose moving the functions below into
Tensor
fromStorage
.With the new design, we can consider eliminating the entire
Storage
component, as it essentially functions as a one-dimensional array. This functionality can be effectively replaced by usingstd::vector
orthrust::vector
as a member variable within theTensor
component. By making this change, we can streamline the implementation and reduce complexity in the codebase.Steps
Following these steps will enable the compiler to detect more mistakes made during the refactoring process.
class Storage_base
into an abstract class.USIInit
,Storage::Init()
,Storage_base::Init()
, andStorage_base::_Init_byptr()
with constructors and factory classes.utils_internal::Movemem_cpu_x
,utils_internal::Fill_gpu
, and their CUDA counterparts.class Storage_base
.class Tensor_Impl
.Storage
.class StorageBase
andclass TensorBase
.class Storage
orclass Tensor
to instead consumeclass StorageBase
,class TensorBase
, or raw pointers.