Open qiangxu opened 3 weeks ago
In training mode, the simulation ends once we reach 1000 ticks (this is the default value, but there is room for experimentation to determine the optimal simulation length for training). In evaluation mode, we only stop once we have processed all the available data.
Let me know if this doesn't answer your question.
As self.trajectory['rewards'] is not updated if the mode is "train", the loop ends only if sim.tick() is out of data. Is this a bug?