Closed dlqudwns closed 1 year ago
Sorry for the late check. Yes, it is not intended behavior. I am fixing this quickly as possible. This will be resolved on 0.2.2 If the problem has resolved then I will close this issue
Updated 0.2.2. Check the repository and arcle pypi site. If there are further bugs, please open another issue!
Currently reward is given when env.answer is equal to current self.grid. The environment is not terminated until submit action is executed.
This two will cause the optimal agent to not submit. I believe this is not an intended behavior.