MrRobb / gym-rs

OpenAI Gym bindings for Rust
MIT License
61 stars 4 forks source link

Different values for `is_done` compared to Python library after certain amount of steps #33

Open GeckoEidechse opened 3 years ago

GeckoEidechse commented 3 years ago

I was working with the mountain car environment and I noticed that unlike in my Python code, the Rust version would always end an episode after 200 steps due to the OpenAI gym library indicating the episode as 'done'.

Interestingly enough,

import gym
gym.make("MountainCar-v0")._max_episode_steps

returns 200, meaning the Rust library is correct in returning is_done as true after 200 steps. However I'm not observing the same result in the original Python library. Considering that this Rust library is supposed to be a frontend to the Python library, I'd argue it should mimic its results, even if incorrect.

Minimum working examples showing the difference:

Python

import gym
env = gym.make("MountainCar-v0")
env.env.reset()
for i in range(300):
    observation, reward, is_done, info = env.env.step(1)
    if is_done:
        print("is_done == true at step:", i)
        break

Rust

extern crate gym;
fn main() {
    let gym = gym::GymClient::default();
    let env = gym.make("MountainCar-v0");
    env.reset();
    for i in 0..300 {
        let gym::State {observation, reward, is_done} = env.step(&gym::Action::DISCRETE(1)).unwrap();
        if is_done {
            println!("is_done == true at step: {}", i);
            break;
        }       
    }
}
hemaolong commented 1 year ago

I encount the same problem, it would be nice to have the same behavior with gym-python.