Open Dando18 opened 5 years ago
A new branch has been created to overhaul MagmaDNN's memory system.
New Branch: memory-fixes
The goal of this branch is to replace most pointer use in MagmaDNN with references or smart pointers. This aligns with a modern c++ approach.
Here as an outline of how MagmaDNN's memory management will be refactored into a more modern c++ style.
In MagmaDNN v1.0, just about everything works in pointers. All of the ::magmadnn::math
functions take Tensor<T>*
pointers rather than plain tensors. This was necessary due to the lack of copy/move/assign semantics for the ::magmadnn::Tensor<T>
class. Also operators and other tensor containers allocate, instantiate, and manage tensor pointers.
This is C-style and an unsafe programming style. Modern C++ favors references over pointers. Where resource management is necessary (::magmadnn::MemoryManager
) it is favorable to use smart pointers and rather than raw pointers.
First we shall change Tensors and the MemoryManager class to use smart pointers. Tensors will use reference counting to avoid copying of data pointers.
Then proper copy/move/assign semantics are required for the Tensor class. This should be simple if std::shared_ptr
is used for the MemoryManager
. Then default copying should suffice.
Finally, the ::magmadnn::math
library needs to be rewritten to utilize tensor references rather than pointers. This is a more appropriate C++ style interface. It also removes the possibility of nullptr
values.
An example func:
void add(const Tensor& a, const Tensor& b, Tensor& out) { ... }
Operations will still needs pointers to children and parents. This is inherent to the graph data structure. However, rather than ::magmadnn::op::add( ... )
returning a raw allocated pointer, it should now add a std::unique_ptr
to a central Graph
class and return the non-owner raw pointer. The raw pointer can now be used in an algorithmic sense rather than for resource management.
A significant portion of MagmaDNN routines are not const correct. We will be enforcing const correctness on member functions and ::magmadnn::math
function parameters. This is for type safety, to prevent bugs, and provide a cleaner more complete interface.
Currently Operations and math functions are restricted by type. We will remove the type template argument from most of these. For example, ::magmadnn::Tensor<T>
will become just ::magmadnn::Tensor
. Data type will be signified by some enum DataType
, which specifies the supported values.
We will run and check various MagmaDNN routines against valgrind and cuda-memcheck to be certain of its memory behaviour and performance.
Just to comment on the above memory strategy proposed for MagmaDNN: it will break a lot (pretty much all) existing code. This is unfortunate, but the memory update is necessary for growing the framework and there are also not that many users currently.
Describe the bug Passing stack allocated tensor objects to some MagmaDNN functions causes stack smashing errors. The bug is somewhat unpredictable. It does not happen every run.
To Reproduce
Expected behavior No stack smashing.
Environment:
Additional context Due to
magmadnn::Tensor<T>
's copy constructor and MemoryManager copying.