Closed jovanchog closed 7 months ago
Hello, thank you for acknowledging our work.
The action chosen in the first step is not chosen at random. Instead, it is based on the first observation after the environment resets. So I guess there is some misinterpretation here.
Hi.
First of all, I want to thank you for sharing this repository so that others can benefit from your expertise on RL.
I have a question about the possibility of creating an episodic environment with only one step. Based on what I can see in the code, the action chosen in the first step is always chosen at random. This suggests that the agent makes a decision based on knowledge from previous episodes rather than on actual observations.
Hence, is it possible to create such an environment, or am I misinterpreting the code?
Thank you in advance for your answer.
Best regards.