evolvio4all / evolvio4JavaScript

MIT License
10 stars 2 forks source link

Huge performance sink I was unaware of #24

Closed Air1N closed 5 years ago

Air1N commented 5 years ago

JSPerf

Air1N commented 5 years ago

This is a website called jsPerf, this is the array that I'm using. array

The website tests how fast each one operates and as you can see, the slowest method is the one getting used the most (since there are about 15,000 - 25,000 tiles). This is a huge problem for performance and probably why you had to reduce map size in the first place.

They all perform the same basic operation of looping through the array.

winnie334 commented 5 years ago

Interesting discovery, I did not know about this either.

But for maximum performance improvement, something should change fundamentally about the feedForward function in brain.js. It is a quadruple-nested for loop and is, according to chrome, responsible for 70% of the time spent processing.

Of course, this is not my expertise. I have no idea how the brain exactly works. There might not even be a single improvement possible in this function (I strongly doubt this though). But, it would have a huge impact on performance if even the smallest optimization was found in the most inner loop.

I'm happy you're looking into this though, better performance means faster calculating which results in longer evolution :)

winnie334 commented 5 years ago

this is probably the most expensive thing in the code

I just tested it and in the short timeframe I recorded, the code you showed took 13 milliseconds in total to run (0.8% of cpu usage). Compare that to other expensive functions such as feedForward, who took 2536 milliseconds, or 68% of the cpu time. Of course, it is slightly more complex, but I think it is a wrong assumption to blame the tile update for the performance sink.

Here is an image of another test, which it shows better (I put the season code in a separate function)

Air1N commented 5 years ago

I am well aware the feedForward function is the single most expensive function in terms of processing time. It did contain the of problem but in the most recent commit I have fixed it. I have tried to optimize it as much as I can, but there isn't much more I can do without complicating things a relatively large amount — or, I'm pretty sure that's the case.

winnie334 commented 5 years ago

Alright, I was just mentioning it so you don't focus too much on that small snippet of code :)

It would be great if we could improve some of those fundamental functions though.

Air1N commented 5 years ago

I do think we should split the update function into more functions so we can profile it better

winnie334 commented 5 years ago

Yep, exactly my thoughts. Personally I have split it up in doSeasonStuff() and doCreateStuff(), but it isn't really worth it making a pull request for that. Perhaps you can also split up the creature stuff even more, since it is quite long on itself.

Air1N commented 5 years ago

It is already pretty clear that feedForward is the most expensive function by far, maybe there is something we could do to make it faster?

winnie334 commented 5 years ago

Well, you know what's happening there! But honestly, calculating each node with every weight for every LSTM for every creature, every single frame, sounds pretty costly on itself. Of course there might be some simple things you can do (decrease the number of inputs, give a lower score to larger networks do discourage thousands of nodes, etc) but I wouldn't know what fundamental design changes you could make to improve the performance.

Air1N commented 5 years ago

Alright, so the only factor of brain size even a little out of my control is determined by the number of eyes. However, I can even limit this by limiting the number of eyes. If you'd look at the current version you'll see I just implemented a method to double creature.feedForward's execution speed!

Air1N commented 5 years ago

Uhhh, could you look and check and see what the autoMode calculates for you? Mine is about 30x - 35x at 50 population

Air1N commented 5 years ago

Since this specific issue has been fixed, I'm closing this.

ghost commented 5 years ago

Found a potential improvements that might work. Please note I've not even tried them so don't hold your breath lol

The Creature.feedForward method uses the Math.tanh() function and someone made a fast approximation method that might be worth looking at. I tried looking for a fast approximation for Math.exp() but couldn't find any apart from some really heavy mathy stuff to do with Taylor series expansion.

There was also some interesting performance stuff about typed arrays but that was less clear.

ghost commented 5 years ago

There's a jsperf test here that loops a classic array vs a new Array(n) that suggests they have basically the same performance within a few percent

ghost commented 5 years ago

Another idea to improve performance might be to only render a strip of the tiles on each tick. The colour doesn't change a great deal so it won't be noticeable unless a creature dies on a tile the user is looking at. The only awkward thing would be that any tiles with a creature even partially over it would still need rendering. I bet it could reduce the time significantly if say only a quarter of the tiles were drawn per tick.

Air1N commented 5 years ago

The main problem isn't with rendering (as it only happens once each loop, no matter the timescale). The main issue is the update function (which isn't even that bad) because it gets updated timescale times, so, if it were 100x timescale it'd run update 100 times and render would still run once. My goal is to make it so the update can run as many times as possible in 50ms. Right now I get about 30x-40x in 50ms (without fast tanh) but I'm sure we can do a little better at least.

Air1N commented 5 years ago

Compare: https://jsperf.com/loop-array-classic-vs-object/1 with: https://jsperf.com/loop-array-classic-vs-object/2

Classic is about 2% to 8% slower if the values aren't initialized. However, object notation is slower if the values are initialized as 0.