blazium-engine / blazium

Blazium Engine – Multi-platform 2D and 3D game engine
https://blazium.app/
MIT License
149 stars 19 forks source link

Optimized Node.find_children: #59

Closed MisterPuma80 closed 3 days ago

MisterPuma80 commented 4 days ago

This should be tested thoroughly before merging.

This changes Node.find_children to be more efficient by changing 2 things:

  1. Instead of using recursion to have find_children call itself, it uses its own stack (the array to_search). This saves it from having to re-validate all the parameters and having to push everything onto the callstack every time the function is called.
  2. We can look up the search pattern and type information at the start and save it. Before it was literally getting this data from a hash table for each iteration of the loop.

I tested this in an open world jrpg I'm making. It runs about 40% to 60% faster.

Norrox commented 4 days ago

When i increase so it iterate through more children each time your solution seems to be faster. but are there less children each time it seems to be slower, am i understanding this correctly 50 000 children, 10 000 itherations image image image

100 000 children, 10 000 iterations image

Norrox commented 4 days ago

image image

Norrox commented 4 days ago

Here we can see the difference

extends Node

var root_node: Node
const TOTAL_NODES = 600
const MAX_CHILDREN_PER_NODE = 5
const NUM_ITERATIONS = 100
const SEARCH_PATTERNS = ["Child*", "Grandchild*", "GreatGrandchild*"]

func _ready():
    # Setup: Create a branching tree of nodes
    root_node = Node.new()
    root_node.name = "Root"
    add_child(root_node)  # Add to scene tree to enable find_children
    create_tree(root_node, 0)
    print("Tree created with %d nodes" % count_nodes(root_node))

    # Start the continuous benchmark
    run_benchmark()

func create_tree(parent: Node, depth: int):
    if count_nodes(root_node) >= TOTAL_NODES:
        return

    var num_children = randi() % (MAX_CHILDREN_PER_NODE + 1)
    for i in range(num_children):
        var child = Node.new()
        if depth == 0:
            child.name = "Child%d" % i
        elif depth == 1:
            child.name = "Grandchild%d" % i
        else:
            child.name = "GreatGrandchild%d" % i
        parent.add_child(child)
        create_tree(child, depth + 1)

func count_nodes(node: Node) -> int:
    var count = 1  # Count the node itself
    for child in node.get_children():
        count += count_nodes(child)
    return count

func run_benchmark():
    while true:
        var start_time = Time.get_ticks_usec()
        var total_found = 0

        for i in range(NUM_ITERATIONS):
            # Randomly select a search pattern
            var pattern = SEARCH_PATTERNS[randi() % SEARCH_PATTERNS.size()]

            # Test find_children performance
            var children = root_node.find_children(pattern, "Node", true, false)
            total_found += children.size()

        var end_time = Time.get_ticks_usec()
        var total_time = (end_time - start_time) / 1000000.0  # Convert to seconds

        # Print results
        print("\nBenchmark Results:")
        print("Timestamp: %s" % Time.get_datetime_string_from_system())
        print("Total time: %.4f seconds" % total_time)
        print("Average time per iteration: %.4f milliseconds" % (total_time * 1000 / NUM_ITERATIONS))
        print("Average children found per iteration: %.2f" % (float(total_found) / NUM_ITERATIONS))

        # Wait for 2 seconds before the next run
        await get_tree().create_timer(2.0).timeout

func _exit_tree():
    # Clean up
    if is_instance_valid(root_node):
        root_node.queue_free()

image

sshiiden commented 4 days ago

I've noticed that the returned arrays are empty. Demostrated by this line image

Norrox commented 4 days ago

i ran this code on my main project with these results

func find_and_print_children(child_type: String):
    var start_time = Time.get_ticks_msec()

    # Find all children of the specified type
    var children = find_children("*", child_type, true, false)

    var end_time = Time.get_ticks_msec()
    var duration = end_time - start_time

    # Print results
    print("Found ", children.size(), " children of type ", child_type)
    print("Time taken: ", duration, " milliseconds")

    # Optionally, print each child's name
    for child in children:
        print("- ", child.name)

image image

Norrox commented 4 days ago

I've noticed that the returned arrays are empty. Demostrated by this line image

fixed in the above example :D

sshiiden commented 4 days ago

Blazium | Godot

MAX_CHILDREN_PER_NODE = 200 image MAX_CHILDREN_PER_NODE = 10 image

Test script

extends Node

var root_node: Node
const TOTAL_NODES = 50
const MAX_CHILDREN_PER_NODE = 200
const NUM_ITERATIONS = 100
const SEARCH_PATTERNS = ["Child*", "Grandchild*", "GreatGrandchild*"]

func _ready():
    seed(42)
    # Setup: Create a branching tree of nodes
    root_node = Node.new()
    root_node.name = "Root"
    add_child(root_node)  # Add to scene tree to enable find_children
    create_tree(root_node, 0)
    print("Tree created with %d nodes" % count_nodes(root_node))

    # Start the continuous benchmark
    run_benchmark()

func create_tree(parent: Node, depth: int):
    if count_nodes(root_node) >= TOTAL_NODES:
        return

    var num_children = randi() % (MAX_CHILDREN_PER_NODE + 1)
    for i in range(num_children):
        var child = Node.new()
        if depth == 0:
            child.name = "Child%d" % i
        elif depth == 1:
            child.name = "Grandchild%d" % i
        else:
            child.name = "GreatGrandchild%d" % i
        parent.add_child(child)
        create_tree(child, depth + 1)

func count_nodes(node: Node) -> int:
    var count = 1  # Count the node itself
    for child in node.get_children():
        count += count_nodes(child)
    return count

func run_benchmark():
    while true:
        var start_time = Time.get_ticks_usec()
        var total_found = 0

        for i in range(NUM_ITERATIONS):
            # Randomly select a search pattern
            var pattern = SEARCH_PATTERNS[randi() % SEARCH_PATTERNS.size()]

            # Test find_children performance
            var children = root_node.find_children(pattern, "Node", true, false)
            total_found += children.size()

        var end_time = Time.get_ticks_usec()
        var total_time = (end_time - start_time) / 1000000.0  # Convert to seconds

        # Print results
        print("\nBenchmark Results:")
        print("Timestamp: %s" % Time.get_datetime_string_from_system())
        print("Total time: %.4f seconds" % total_time)
        print("Average time per iteration: %.4f milliseconds" % (total_time * 1000 / NUM_ITERATIONS))
        print("Average children found per iteration: %.2f" % (float(total_found) / NUM_ITERATIONS))

        # Wait for 2 seconds before the next run
        await get_tree().create_timer(2.0).timeout

func _exit_tree():
    # Clean up
    if is_instance_valid(root_node):
        root_node.queue_free()
Norrox commented 4 days ago

Blazium | Godot

MAX_CHILDREN_PER_NODE = 200 image MAX_CHILDREN_PER_NODE = 10 image

Test script

extends Node

var root_node: Node
const TOTAL_NODES = 50
const MAX_CHILDREN_PER_NODE = 200
const NUM_ITERATIONS = 100
const SEARCH_PATTERNS = ["Child*", "Grandchild*", "GreatGrandchild*"]

func _ready():
  seed(42)
  # Setup: Create a branching tree of nodes
  root_node = Node.new()
  root_node.name = "Root"
  add_child(root_node)  # Add to scene tree to enable find_children
  create_tree(root_node, 0)
  print("Tree created with %d nodes" % count_nodes(root_node))

  # Start the continuous benchmark
  run_benchmark()

func create_tree(parent: Node, depth: int):
  if count_nodes(root_node) >= TOTAL_NODES:
      return

  var num_children = randi() % (MAX_CHILDREN_PER_NODE + 1)
  for i in range(num_children):
      var child = Node.new()
      if depth == 0:
          child.name = "Child%d" % i
      elif depth == 1:
          child.name = "Grandchild%d" % i
      else:
          child.name = "GreatGrandchild%d" % i
      parent.add_child(child)
      create_tree(child, depth + 1)

func count_nodes(node: Node) -> int:
  var count = 1  # Count the node itself
  for child in node.get_children():
      count += count_nodes(child)
  return count

func run_benchmark():
  while true:
      var start_time = Time.get_ticks_usec()
      var total_found = 0

      for i in range(NUM_ITERATIONS):
          # Randomly select a search pattern
          var pattern = SEARCH_PATTERNS[randi() % SEARCH_PATTERNS.size()]

          # Test find_children performance
          var children = root_node.find_children(pattern, "Node", true, false)
          total_found += children.size()

      var end_time = Time.get_ticks_usec()
      var total_time = (end_time - start_time) / 1000000.0  # Convert to seconds

      # Print results
      print("\nBenchmark Results:")
      print("Timestamp: %s" % Time.get_datetime_string_from_system())
      print("Total time: %.4f seconds" % total_time)
      print("Average time per iteration: %.4f milliseconds" % (total_time * 1000 / NUM_ITERATIONS))
      print("Average children found per iteration: %.2f" % (float(total_found) / NUM_ITERATIONS))

      # Wait for 2 seconds before the next run
      await get_tree().create_timer(2.0).timeout

func _exit_tree():
  # Clean up
  if is_instance_valid(root_node):
      root_node.queue_free()

mine is faster image

Norrox commented 4 days ago

@sshiiden seed 42 600/5/100 image 50/5/200 image 50/200/100 image

extends Node

var root_node: Node
const TOTAL_NODES = 600
const MAX_CHILDREN_PER_NODE = 5
const NUM_ITERATIONS = 100
const SEARCH_PATTERNS = ["Child*", "Grandchild*", "GreatGrandchild*"]

func _ready():
    seed(42)
    # Setup: Create a branching tree of nodes
    root_node = Node.new()
    root_node.name = "Root"
    add_child(root_node)  # Add to scene tree to enable find_children
    create_tree(root_node, 0)
    print("Tree created with %d nodes" % count_nodes(root_node))

    # Start the continuous benchmark
    run_benchmark()

func create_tree(parent: Node, depth: int):
    if count_nodes(root_node) >= TOTAL_NODES:
        return

    var num_children = randi() % (MAX_CHILDREN_PER_NODE + 1)
    for i in range(num_children):
        var child = Node.new()
        if depth == 0:
            child.name = "Child%d" % i
        elif depth == 1:
            child.name = "Grandchild%d" % i
        else:
            child.name = "GreatGrandchild%d" % i
        parent.add_child(child)
        create_tree(child, depth + 1)

func count_nodes(node: Node) -> int:
    var count = 1  # Count the node itself
    for child in node.get_children():
        count += count_nodes(child)
    return count

func run_benchmark():
    while true:
        var start_time = Time.get_ticks_usec()
        var total_found = 0

        for i in range(NUM_ITERATIONS):
            # Randomly select a search pattern
            var pattern = SEARCH_PATTERNS[randi() % SEARCH_PATTERNS.size()]

            # Test find_children performance
            var children = root_node.find_children(pattern, "Node", true, false)
            total_found += children.size()

        var end_time = Time.get_ticks_usec()
        var total_time = (end_time - start_time) / 1000000.0  # Convert to seconds

        # Print results
        print("\nBenchmark Results:")
        print("Timestamp: %s" % Time.get_datetime_string_from_system())
        print("Total time: %.4f seconds" % total_time)
        print("Average time per iteration: %.4f milliseconds" % (total_time * 1000 / NUM_ITERATIONS))
        print("Average children found per iteration: %.2f" % (float(total_found) / NUM_ITERATIONS))

        # Wait for 2 seconds before the next run
        await get_tree().create_timer(2.0).timeout

func _exit_tree():
    # Clean up
    if is_instance_valid(root_node):
        root_node.queue_free()
Starkium commented 3 days ago

any reasons I shouldn't merge this yet? I like the concept, it's the same as how unreal engine works.

MisterPuma80 commented 3 days ago

Thanks for the testing @Norrox . I bet the lack of performance increase with large number of nodes is because to_search is an array, and it is being pushed to and popped from constantly. I might try it with a linked list later.

Norrox commented 3 days ago

any reasons I shouldn't merge this yet?

I like the concept, it's the same as how unreal engine works.

No reason! Other than wait for more reviewers 😅

Norrox commented 3 days ago

any reasons I shouldn't merge this yet? I like the concept, it's the same as how unreal engine works.

@Starkium well if you approve then we are 5!

jss2a98aj commented 3 days ago

I am still bench-marking some things with this, so please hold of on merging for the moment

jss2a98aj commented 3 days ago

I am getting entirely worse performance. image

Starkium commented 3 days ago

this looks like it takes longer?