google-deepmind / alphadev

Apache License 2.0
674 stars 67 forks source link

Is it correct to call `step` again in the last leaf on the environment? #8

Open adbelniak opened 5 months ago

adbelniak commented 5 months ago

    while node.expanded():
      action, node = _select_child(config, node, min_max_stats)
      sim_env.step(action)
      history.add_action(action)
      search_path.append(node)

    # Inside the search tree we use the environment to obtain the next
    # observation and reward given an action.
    observation, reward = sim_env.step(action)

Line 1031. Is it correct to call again sim_env.step(action) after loop's end? It seems that this program do additional action from previous node on the final leaf.