ConfeitoHS / arcle

A Gymnasium-based Environment of the Abstraction and Reasoning Corpus (ARC)
Apache License 2.0
55 stars 9 forks source link

About the reward function and termination condition of o2arcenv #1

Closed dlqudwns closed 1 year ago

dlqudwns commented 1 year ago

Currently reward is given when env.answer is equal to current self.grid. The environment is not terminated until submit action is executed.

This two will cause the optimal agent to not submit. I believe this is not an intended behavior.

ConfeitoHS commented 1 year ago

Sorry for the late check. Yes, it is not intended behavior. I am fixing this quickly as possible. This will be resolved on 0.2.2 If the problem has resolved then I will close this issue

ConfeitoHS commented 1 year ago

Updated 0.2.2. Check the repository and arcle pypi site. If there are further bugs, please open another issue!