accosmin / nano

C++ library [machine learning & numerical optimization] - superseeded by libnano
MIT License
1 stars 0 forks source link

Process 4D tensors #123

Closed accosmin closed 7 years ago

accosmin commented 8 years ago

Having 4D tensors as inputs (aka multiple samples to process at the same time) should greatly improve the speed of MLPs. To investigate and benchmark.

This may require a refactoring of forward_model_t to allocate the input/outputs once per layer. Currently each layer keeps a copy of both input and output tensors.

accosmin commented 8 years ago

Need a task_t::copy(fold, begin, end, tensor4d_t& buffer) function to do the normalization directly in the given buffer. The buffer should be stored in model_t or accumulator_t. Then there will no memory allocations per sample and the thread-splitting is handled directly in the model.

accosmin commented 8 years ago

May split this task into several steps:

accosmin commented 7 years ago

May modify layer_t to store and to manipulate the cumulated gradients: