Mindwerks / worldengine

World generator using simulation of plates, rain shadow, erosion, etc.
MIT License
992 stars 127 forks source link

Reorganize World #207

Closed ftomassetti closed 8 years ago

ftomassetti commented 8 years ago

This is an initial step to reorganize the World class. Other refactoring will follow but I thought it would be better to start discussing and merging this one.

Now most of data is grouped either under generation_params or in layers.

A Layer represent a piece of information available for each cell. Each layer has a matrix of data, some of them have also thresholds while others have quantiles.

The idea is to treat layers in the more homogeneous way possible.

tcld commented 8 years ago

Oh. My. God. :D I would vote for merging everything else first so I don't have to fix all of my PRs.^^ Aside from that this is certainly a good idea; the tests were cancelled for some reason, though.

To make sure that there are no bugs left: How about you add another test that basically does what data_generator.py does (i.e. generates a world exactly like the one in there)? Then you could compare the "new" world with the "old" one, and if everything checks out, the new code should be fine. Right?

ftomassetti commented 8 years ago

Given this one would have quite an impact it would nice to have the opinion of @psi29a before merging it

tcld commented 8 years ago

I looked over it once, comments above.

I guess this should be merged as soon as possible, since it will break every piece of code that isn't already merged. (I would prefer if #188 and #203 could be merged first, so I don't have to invest more work... but overall it might even be easier if they were merged after this PR.)

I also want to mention all the used "thresholds" again - it would be really nice if they could just be stored via for-loop. It would make it slightly easier to modify/add thresholds.

ftomassetti commented 8 years ago

About:

To make sure that there are no bugs left: How about you add another test that basically does what data_generator.py does (i.e. generates a world exactly like the one in there)? 

I think that would not work because platec could give different results on different platforms (for example because of rounding or different optimization with floating numbers)

tcld commented 8 years ago

Maybe if you started from a plate-only world?

psi29a commented 8 years ago

Great googly moogly... I'll give this an eye-ball around lunch today.

Can you give a brief overview of what you are trying accomplish with this PR? That way I have a clue as to what I'm looking at. :) Not that what you've posted above isn't good enough, this is just a large PR. Message me on gtalk.

ftomassetti commented 8 years ago

GitHub did not told me you edited the comment... I will reach you on GTalk. The main idea is to put some order in the World class:

Each layer is an instance of Layer, LayerWithThresholds or LayerWithQuantiles. All of these classes have a field named data which is a numpy array. Then they could have also thresholds or quantiles :)

The idea is to make layers to be more homogeneous: before world.elevation['data'] and world.ocean were numpy arrays. Now you would access both as world.layers['elevation'].data and world.layers['ocean'].data.

When we add a new piece of information (e.g., wind) we just add a new layer.

Aside from that I tried to use encapsulation when possible, so using method such as world.is_ocean(pos) instead of accessing directly variables (e.g., world.ocean[y, x] or world.elevation['data'][y, x]). This should make the code more explicit and more robust to changes. I left direct access to internal variables where it made sense for performance reasons.

tcld commented 8 years ago

Since you mention performance, I wrote a little script yesterday. It accesses an array 50 million times via different means, that's around the order of accesses WorldEngine performs for a medium-sized world. Accessing an array via a separate function is not that bad compared to the overall generation time (it might amount to ~10%).

System: linux2
Python: v2.7.6
 Numpy: v1.8.2

                   Empty function (50.0M calls):  5.869s
         Direct access via [y, x] (50.0M calls):  8.574s
                  Function access (50.0M calls): 15.836s
Function access w/ unpacked tuple (50.0M calls): 14.737s
         Direct access via [y][x] (50.0M calls): 15.560s

 Python for-loop (10 iterations):  0.885s
  Numpy for-loop (10 iterations):  5.684s
Numpy while-loop (10 iterations):  4.593s
System: linux
Python: v3.4.3
 Numpy: v1.9.2

                   Empty function (50.0M calls):  5.736s
         Direct access via [y, x] (50.0M calls):  6.701s
                  Function access (50.0M calls): 11.592s
Function access w/ unpacked tuple (50.0M calls): 11.420s
         Direct access via [y][x] (50.0M calls): 12.282s

 Python for-loop (10 iterations):  0.995s
  Numpy for-loop (10 iterations):  4.619s
Numpy while-loop (10 iterations):  3.939s

An additional suggestion, to keep things clear and clean, would be to add new functions to World to request a view on the whole array at once, similar to one of these:

def temperature(self, x1=None,y1=None,x2=None,y2=None):
    return self.layers['temperature'].data[y1:y2+1, x1:x2+1]

def temperature(self):
    return self.layers['temperature'].data

def layer(self, layer):
    return self.layers[layer].data

So when somebody wants to do fancy stuff with numpy, e.g. output = w.layers['temperature'] / w.layers['humidity'], you can still hide away the actual access of the array and keep control within ẁorld.py 100% of the time: output = w.temperature() / w.humidity()

EDIT: I see you already had the same idea. I guess, this is the time and place to do this kind of stuff.^^

ftomassetti commented 8 years ago

We need to merge this one :)

psi29a commented 8 years ago

No real comments, and looks good to me.

ftomassetti commented 8 years ago

Great, thanks!