Mindwerks / worldengine

World generator using simulation of plates, rain shadow, erosion, etc.
MIT License
982 stars 128 forks source link

A bit of NumPy #132

Closed tcld closed 8 years ago

tcld commented 8 years ago

I played around with a profiler, picked out one of the slowest functions and took a look at what could be improved - I found a couple of smaller things, but nothing major. I then proceeded to see what numpy could do - but it actually doesn't do much. Speed improvements are within the margin of error, memory improvements weren't measurable due to me not knowing how to properly use memory_profiler.

The generated maps are 100% the same as before, result of the tests hasn't changed either.

Here some more information, not very useful and quite boring: http://pastebin.com/8p1vwaV1

tcld commented 8 years ago

Sh**, I accidentally added the World_pb2.py I got from @psi29a . Could somebody tell me how I can undo that?

EDIT: Ok, at the cost of some extra commits that problem seems to be solved.

tcld commented 8 years ago

Oh, the build failed. When I return that numpy-array for the irrigation, it causes problems whenever something like "world1 == world2" is called since that array seemingly differs from a standard array more than expected (and thus "==" is not as clearly defined as before).

I will think about it.

tcld commented 8 years ago

I pushed another commit, now it hopefully works. I tried to test the addition to the protobuf-code (in world.py) but I am not sure what triggers it. "worldengine --protocol-buffer" ran fine.

I also did some testing regarding the memory consumption of a numpy array vs. a standard array. Outputs are read off of the Ubuntu Task Manager. Imprecise for small arrays. And for some reason there are two values - private and (virtual). The number on the left is the size of the array (e.g. 1000*1000).

old: 1000^2: 5 (40) MB -> 14 ( 48) MB 4000^2: 5 (40) MB -> 135 ( 169) MB 10000^2: 5 (40) MB -> 808 ( 876) MB 20000^2: 5 (40) MB -> 3134 (3466) MB

new: 1000^2: 5 (40) MB -> 17 ( 93) MB 4000^2: 5 (40) MB -> 17 ( 207) MB 10000^2: 5 (40) MB -> 17 ( 848) MB 20000^2: 5 (40) MB -> 17 (3137) MB

Creation and destruction are a lot faster for the numpy arrays. Memory consumption in comparison to standard arrays is slightly better for large arrays. As far as I understand it's the iteration that isn't much faster - but there are some optimized iteration possibilities, maybe they prove useful in other places of worldengine. It was certainly worth a try, and depending on what you guys think about this and if the tests will start going through at some point, I might start looking at some of the other code-files and add in some numpy-code - hints on where it might be most profitable are welcome. (I don't have to go all out as with this commit, I just wanted to give it a fair shot.)

tcld commented 8 years ago

Yippie, the tests are passed! :) (Except for the lottery-test...)

psi29a commented 8 years ago

Nice work! We've had this on our todo list for awhile. :)

I think there are more places in the code where the 2d arrays should be numpy.

tcld commented 8 years ago

Yes, there are. It would be great if all of them were, that might eliminate some overhead converting between the two types and give a very clean memory-footprint and small speed improvements.

When I started editing the function I just wanted to exchange the array for a numpy array, due to the TODO-remark. When I then dug into the documentation I noticed that there is a lot more that could be done with numpy - sadly the improvements are pretty minor, I was hoping for at least a factor of two in speed.

EDIT: Although I have to add that I never used numpy before, it is very likely that more improvements can be made.

tcld commented 8 years ago

I'm still polishing this, so please nobody merge it. numpy is kind of insane. :)

psi29a commented 8 years ago

I'm a little worried about your revert of World_pb2.py, since what was in master was compatible with python2 and python3.

Did it not work with your version of protobuf?

tcld commented 8 years ago

Oh, no! It worked perfectly. I guess I used "commit -a" when I should have used "commit". I'm sorry. :( I will remove it. Again.

psi29a commented 8 years ago

No worries, nothing that can't be undone. I was just reading through your change-set and noticed it. :)

When you are ready for us to really look at it (and merge), just say so. We'll likely have you do a:

git rebase master -i

which will allow you to squash all your commits into one. This makes your commit changes super clean as a result. :)

tcld commented 8 years ago

Thanks for the hint. :)

The code as is is in a state of readiness. But numpy seems to be very powerful and I am currently trying to find a way to get rid of most of the iterations in the modified file. It would be nice to have a really polished example of numpy-usage in the code, maybe somebody else decides to put more numpy into worldengine and needs a reference. I, for one, had no idea that numpy would prove to be such a deep pond.

(It's a shame I will have to throw away my coverage-lottery ticket with the next commit, though.^^)

tcld commented 8 years ago

I might be done. Offline-tests ran fine, result of the generation is exactly the same as with the old algorithm and the code now looks quite beautiful to me.^^ I desperately have to run some performance tests on this because this now needs two orders of magnitude less iterations and the things that are done should be doable very efficiently. The old "new" method saved about 10% of time, hopefully the current one is at least as good.

tcld commented 8 years ago

Yippie, cProfile reports about 98% less time spent in irrigation.py - which is about 25% less time overall! :dancers:

(I also won the lottery again, this patch seems to be ripe for a merge. :) )

EDIT: Here some numbers: http://pastebin.com/gdQviHa4 I won't run line_profile after this, and the memory footprint of numpy in general was already covered in one of my earlier messages here. I declare testing done. :)

PS: This was done using a compiled master-build versus an uncompiled (?) test-build. I'm not sure when Python decides to compile code but there could be even more of a speed-up hidden somewhere.

EDIT2: I noticed that for the small test map (256x256) this saves 15 million (!) function calls (about 1/3 of the total).

psi29a commented 8 years ago

Python unlike C/C++ (compiled languages) is an interpreted language. It will write bytecode out when the file is first run into *.pyc files. Python is smart enough to invalidate pyc files when the py files are changed but will only actually run the pyc files. Thus the Python interpreter will take the pyc files and do what is required of it... typical of a scripting language like lua, php or worse... javascript. ;)

So I'm not really sure what you mean by compiled master-build and uncompiled test-build.

If you mean our one file things found in our github releases page, that is a whole other ball of yarn having to do with pyinstaller black-magic. That assembles all the different libraries and Python bytecode into one file that resembles a self-extracting zip file.

tcld commented 8 years ago

I meant: I properly installed the master-build. Then uninstalled, removed all traces of .pyc (I don't trust Python because of some mishap in the past) and then reran the script.

But it might not be important if there is any more time that can be shaved off of the 0.2 seconds that are left. :P

psi29a commented 8 years ago

For sake of nailing down definitions... let us just call it based on the branch.
How about just master vs. perf_opt or if you are talking about a release, use the tag name like 0.18 vs. perf_opt it is safe to say your branch or tag vs. another branch and tag because they are unique. :)

tcld commented 8 years ago

That seems more clear, I will remember it.^^

Meanwhile I am looking through the profiler-output to find some more timesinks. There are some that might be worth looking into, but nothing really prominent.

This one is interesting, though:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1000    8.378    0.008    8.378    0.008 {built-in method step}

Is that what is done when iterating through a range? I have never seen a step()-method - and a quick look into the Python documentation didn't help either.

EDIT: The following take more than half of the total runtime:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    8.378    0.008    8.378    0.008 {built-in method step}
   289987    3.789    0.000    5.212    0.000 worldengine/world.py:376(tiles_around)
      462    2.237    0.005    2.237    0.005 worldengine/simulations/basic.py:54(count)
   655360    1.742    0.000    1.742    0.000 worldengine/common.py:97(anti_alias_point)
 12389829    1.503    0.000    1.503    0.000 worldengine/world.py:340(is_land)

Everything else is really minor in comparison. Maybe if world.ocean could be made a numpy-array the last point could be diminished (I don't know if accessing a numpy array is quicker than accessing a standard array).

anti_alias_point(), count() and tiles_around() are called very often, so maybe slight improvements would be useful. That doesn't seem to have anything to do with numpy, though - it looks like overall there is not that much room for improving on speed. Memory might be good to look into, but I don't know how to properly use memory_profiler.

psi29a commented 8 years ago

Looks great, merging.