Closed accosmin closed 7 years ago
Need a task_t::copy(fold, begin, end, tensor4d_t& buffer) function to do the normalization directly in the given buffer. The buffer should be stored in model_t or accumulator_t. Then there will no memory allocations per sample and the thread-splitting is handled directly in the model.
May split this task into several steps:
May modify layer_t to store and to manipulate the cumulated gradients:
Having 4D tensors as inputs (aka multiple samples to process at the same time) should greatly improve the speed of MLPs. To investigate and benchmark.
This may require a refactoring of forward_model_t to allocate the input/outputs once per layer. Currently each layer keeps a copy of both input and output tensors.