Zylann / godot_heightmap_plugin

HeightMap terrain for Godot implemented in GDScript
Other
1.76k stars 160 forks source link

FYI: profiling procedural generation #268

Open sarrowsmith opened 3 years ago

sarrowsmith commented 3 years ago

I've done some crude profiling on procedural terrain generation following the recipe in the documentation, ie

    # Step 1:
    #generate heightmap, normalmap and splatmap

    # Step 2:
    var modified_region = Rect2(Vector2(), heightmap.get_size())
    terrain_data.notify_region_change(modified_region, HTerrainData.CHANNEL_HEIGHT)
    terrain_data.notify_region_change(modified_region, HTerrainData.CHANNEL_NORMAL)
    terrain_data.notify_region_change(modified_region, HTerrainData.CHANNEL_SPLAT)

    # Step 3:
    terrain.set_data(terrain_data)

    # Step 4:
    terrain.update_collider()

(I'm running this multithreaded to generate a 4k-radius landscape composed of 1024x1024 HTerrain blocks.)

Step 1 I have my own C# code for, which takes approximately half the total generation time. I'm considering a GDNative rewrite (having already rewritten from GDScript and made it twice as fast).

The other half of the generation time is on average split evenly between steps 2, 3 and 4, ie various aspects of the HTerrain code. Which doesn't give any useful pointers if you're looking for areas where further GDNative implementation would boost performance, but on the other hand suggests that it wouldn't be a pointless exercise for any of these.

sarrowsmith commented 3 years ago

Update: I've also implemented heightmap and normalmap generation through driving the shader-based generator from a script, with a necessary reduction to a 2k-radius landscape. I couldn't measure any difference between the two methods -- past experience with doing similar things in Unity suggests that having a lot of chunks/blocks/sectors can disadvantage shader-based generation because although the GPU can process all the points in a chunk in parallel, the chunks themselves are restricted to serial execution, while CPU threads can generate multiple chunks in parallel. That said, I kitted my machine out to be much more biased to CPU than GPU, so this experience probably isn't representative.