cazala / synaptic

architecture-free neural network library for node.js and the browser
http://caza.la/synaptic
Other
6.92k stars 665 forks source link

Memory leaks & loop performance #307

Open reicek opened 6 years ago

reicek commented 6 years ago

While working on optimizations for AI-World -based on the creatures demo in the synaptic homepage-, noticed that there are many potential optimizations that can be easily applied to the library.

Garbage collector and JS Heap

Problem

JSheap 10 seconds 60 fps run from AI-World using 10 independent Perceptrons(4,6,3). Garbage collector can be seen cleaning all extra memory at the end of every frame.

Neural networks are usually run in big loops. Allocating data in temporal references generates garbage on each cycle, the garbage collector continuously fixes this, but this costs performance and uses way more memory than needed.

Proposition

Replace all var, const & let for permanent memory references to the parent object properties. This way the objects will have a more consistent size and memory positions can be reused instead of always creating new ones.

Ideally all memory allocation should be started in the constructor or (less ideally) added to the main object on first use. Private variables (this._privateVariable) memory allocation can be recycled by different methods inside a class instance, like using a single memory reference for all this._i inside loops.

E.g.


// Before - Generates a new holder for i every time this loop starts
for (var i = 0; i < this.layers.hidden.length; i++)
  this.layers.hidden[i].activate();

// After - i is allocated only once, and can even be recycled by other loops in the same object
for (this._i = 0; this._i < this.layers.hidden.length; this._i++)
  this.layers.hidden[this._i].activate();

Loops replacement to reduce run-time and memory use

Problem

Since neural networks rely heavily on loops, and different loop patterns have different performance, using faster loop patterns can dramatically increase performance in some scenarios.

Proposition

Different loops have different results, so it's always best to test on JSPerf. Object.map() and Object.forEach() seems to give poor performance in many cases.

// Further optimizing previous example
// Before
for (this._i = 0; this._i < this.layers.hidden.length; this._i++)
  this.layers.hidden[this._i].activate();

// After
for (this._i = this.layers.hidden.length -1; this._i >= 0 ; this._i--)
   this.layers.hidden[this._i].activate();

// Also
this.i = this.layers.hidden.length;
while (this._i -- ) 
    this.layers.hidden[this._i].activate();

I'm considering applying those changes on a fork, to validate if there is an actual performance increase.

Thoughts?

JeremyPouyet commented 6 years ago

First proposition

I don't think it's a good idea for many reason:

Second proposition

I totally agree with you ! look at https://medium.com/@AamuLumi/voyage-to-the-most-efficient-loop-in-nodejs-and-a-bit-js-5961d4524c2e it is a benchmark of many different loop in node.js and the speed gain is huge from one solution to another.

reicek commented 6 years ago

Hi @JeremyPouyet, thanks for the feedback!

Headsup: My plan is to make a fork of Synaptics, apply some optimizations for my project's context, test them thoroughly and open a PR with the results.

I'm using this thread to gather all feedback and ideas.

Context

I'm working on this project (Demo) which uses the NN as brains for virtual creatures, and there is usually 12 - 25 of them right now (tested up to 200). The loop (attempts) to run at 60 fps, so this cycle exposes all memory leaks and performance problems.

I've patched most memory leaks and improved execution time using the patterns described on my previous post. From looking at the Synaptics code I can assume that is the source of the leaks.

Here each spike is a cycle, in theory the difference between the bottom and top are memory leaks.

JSheap

Itemized response:

The code becomes less understandable

Agree, in general aggressively improving performance for production usually involves sacrificing a lot of readability, so it's only useful if your app really needs that trade-off.

The i variable can now be changed from the outside of the code and break a loop

This may be true in parallel execution scenarios, but in normal sequential execution it cannot happen.

I don't think it will save a lot of memory (you should benchmark it)

Mostly this pattern makes the size fixed somewhere between the top and bottom of the JS heap. So we reduce the maximum memory footprint and reduce the work of the garbage collector by minimizing the

You remove the possibility of asynchronous calls (2 loop modifying i at the same time)

It is not possible to run 2 loops simultaneously with JavaScript in the browser. Even if both are called using async or a promise, they will run sequentially. Even if you run the NN in the Web Worker mode, each one is a single threaded app, so the loops are safe too.

Loops

For me the problem is that each loop pattern seems to run better depending on many factors, mostly the browser version making it hard to target a clear winner, right now I settled for while (this._i --) {} since I feel it has a nice balance between performance and readability.

That being said it seems that for (this._i = TOTAL; this._i >= 0; this._i --) usually gives the best performance, for example you could try some jsPerf tests like these.

dan-ryan commented 6 years ago

I hear that Neataptic has better performance (https://github.com/wagenaartje/neataptic), so it is another alternative.

kfern commented 6 years ago

@dan-ryan Neataptic is no longer maintained

https://github.com/wagenaartje/neataptic/issues/112

reicek commented 6 years ago

Thanks for sharing @dan-ryan ! Shame it's not actively maintained but on a quick look it seems to be pretty nice, I'll take a look and see what I can learn from that project's approach.

dan-ryan commented 6 years ago

I'm hoping someone will pick it up the project again as it seems to be the best JS one for the Neat algorithm. The codebase was originally built from Synaptic's codebase I believe.