it's similar to beam search except maybe has better characteristics. specifically it wastes less interactions with the GYM but at the risk of not going to the very end of the trajectory. one problem we noticed with beam search is it'll re-do its first beams when it doubles in size, effectively wasting 1/2 of its computation per beam-rollout.
@karldd if you can open a new file by copy/paste the beam-search code, I can take a go at it
it's similar to beam search except maybe has better characteristics. specifically it wastes less interactions with the GYM but at the risk of not going to the very end of the trajectory. one problem we noticed with beam search is it'll re-do its first beams when it doubles in size, effectively wasting 1/2 of its computation per beam-rollout.
@karldd if you can open a new file by copy/paste the beam-search code, I can take a go at it