AlexeyBond / godot-constraint-solving

WFC (Wave Function Collapse) and generic constraint satisfaction problem solver implementation for Godot 4
https://godotengine.org/asset-library/asset/1951
MIT License
336 stars 14 forks source link

First attempt at making multi-layer solving work #4

Closed Supamiu closed 10 months ago

Supamiu commented 10 months ago

related to #2

This is an implementation that adds a new WFCLayeredTileMapMapper2D class that will store cells as Array[Vector4i] instead of Vector4i to handle multiple layers.

It's working but it has many downsides:

This PR is mostly to open discussions on the implementation, it's clearly not ready for a release

Performance

Perf wise, the profiler shows 70ms avg per frame on my machine for the layered mapper vs 20ms for the flat one, this is way too much to be considered as a viable solution imho.

It appears that the # of entries in the sample is affecting performances quite hard, presumably because it creates more rules, but this shows a possible performance issue with cases that would be too complex.

AlexeyBond commented 10 months ago

Not having weights makes the map go crazy quite easily (like say you have one dirt tile alone and one dirt + tree tile, you have 50% to get a tree, making it quite high)

That's expected behavior until tile probabilities (which are totaly independent issue) are implemented.

Having to store these as Array[Vector4i] means that we're duplicating coords instead of just storing coords and then cell+alt in a better structure, that being said I'm not sure it'd make things much faster

Nothing is duplicated there. Coordinares received from map.get_cell_atlas_coords(layer, coords) are coordinates of a tile in texture atlas. map.get_cell_source_id(layer, coords) is index of a texture. So if there is only one texture and no alternative tiles in tile set, map.get_cell_atlas_coords(layer, coords) will be the only value that is different between a tree and dirt.

It appears that the # of entries in the sample is affecting performances quite hard

I guess, a dictionary cannot work as a hash map when keys are mutable values (arrays) and thus it has to sequentially compare new value to all available keys. If that's the problem, I can suggest a possible solution

suggestion I suggest to use a tree of dictionaries instead of single dictionary. E.g. instead of ``` { [Vector4i(tile_from_layer_0), Vector4i(tile_from_layer1), ...]: 1 } ``` store ``` { Vector4i(tile_from_layer0): { Vector4i(tile_from_layer1): { ...: 1 } } } ``` That will make dictionary access functions more complicated: - instead of `_read_cell_attrs` there will be two functions - one for `learn_from` that creates nested dictionaries when necessary and one for `read_cell` that returns `-1` when cannot find mapping on any nesting level. - additional counter will be necessary to generate integer indices for tile types - `_ensure_reverse_mapping` will have to generate arrays from keys in dictionary tree But dictionary access time will become `O(average_tile_types_per_layer * layers)` (or even `O(layers)` if the dictionary will start working as a hash map) instead of `O(total_tile_combinations * layers)`.

However, I'm not sure if it's really a significant issue - learn_from/read_cell calls are not really frequent. What's more worth looking at is write_cell, which is called a lot if render_intermediate_results is enabled in generator. In fact, even write_cell isn't called that frequently when render_intermediate_results is disabled. And it is most likely to be disabled everywhere excapt this addon's examples. So, maybe performance of a mapper doesn't really matter that much.

Perf wise, the profiler shows 70ms avg per frame on my machine for the layered mapper vs 20ms for the flat one, this is way too much to be considered as a viable solution imho.

Did you measure it with the same single-layer map, same sample (and same rules)? If the multi-layer mapper did actually work with more than one layer or rules were different then these measurements are not representative.

Supamiu commented 10 months ago

Yeah I realized that nothing was duplicated and forgot to remove that part of my message 😄

So I made another performance test by recording start time and logging milliseconds duration when done signal emits, here are the results:

WFCLayeredTileMapMapper2D:

WFCTileMapMapper2D

Tests made using the exact same setup, just swapped the mapper in the generator

So turns out ms per frame isn't really representative here, I guess measuring like that is better 😄 . It shows that the most expensive part is not write_cell neither, I checked the learning phase and it's <1ms too so my guess is that the solver is the slow part here (which totally makes sense, we're introducing more possibilities and thus more rules).

I think your suggestion regarding using a 2 dimensions dictionary can help but I don't think I can implement that for now, my knowledge is too limited on the topic to make sure I'm not doing something bad.

The possible improvements I see are:

To elaborate more on the second one, we're generating a 12168 map here, which is 8228 tiles in total, while the typical screen of a tilemap-based game would contain way more in its viewport (a rough check on my game, I'm around 20 12) which we could render as a priority part and then render the rest either while the player is moving or just in background as they start exploring the area.

AlexeyBond commented 10 months ago

I just tested both multi-layer and single-layer mappers on a single-layer tilemap (from original tilemap demo). This way all the other variables (including different rules) are excluded. And, suddenly multi-layer mapper works even a tiny bit faster than single-layer - generation takes 22.26 seconds instead of 22.38 seconds (average of 4 samples in each case).

So, I guess, it's ok to drop a single-layer mapper in favour of new multi-layer one.

I do have couple of comments for this PR and after resolving them it can be merged.

Have the solver be more performant, if that's even possible, I didn't check it at all

I have spent some time optimizing it, so I guess it's current performance is about 80-90% of what is possible to acheive without dramatically changing the algorithm and/or technology.

Have a way to lazy generate a map (like, render a given rect and trigger generator to generate more on demand instead of generating the whole map at once)

I think this can be done using some sort of higher-level node that spawns generators when player approaches ungenerated area. But it's not trivial. The generators should generate some extra area to ensure that there may be no contradictions in still unpopulated regions. Just like with multithreading. Also, it all just won't work for some rule sets and there is yet no correct way implemented to recognize such rules automatically.

One approach to this problem is to fill the whole map with some sort of "ground" tiles and then erase tiles in blocks before generating them, as described here. But it's likely to leave visible block borders in some rule sets.

Supamiu commented 10 months ago

I feel like you should probably squash this when merging to have a better commit history than what's in the PR ^^"