I was working with the mountain car environment and I noticed that unlike in my Python code, the Rust version would always end an episode after 200 steps due to the OpenAI gym library indicating the episode as 'done'.
returns 200, meaning the Rust library is correct in returning is_done as true after 200 steps. However I'm not observing the same result in the original Python library. Considering that this Rust library is supposed to be a frontend to the Python library, I'd argue it should mimic its results, even if incorrect.
Minimum working examples showing the difference:
Python
import gym
env = gym.make("MountainCar-v0")
env.env.reset()
for i in range(300):
observation, reward, is_done, info = env.env.step(1)
if is_done:
print("is_done == true at step:", i)
break
Rust
extern crate gym;
fn main() {
let gym = gym::GymClient::default();
let env = gym.make("MountainCar-v0");
env.reset();
for i in 0..300 {
let gym::State {observation, reward, is_done} = env.step(&gym::Action::DISCRETE(1)).unwrap();
if is_done {
println!("is_done == true at step: {}", i);
break;
}
}
}
I was working with the mountain car environment and I noticed that unlike in my Python code, the Rust version would always end an episode after 200 steps due to the OpenAI gym library indicating the episode as 'done'.
Interestingly enough,
returns
200
, meaning the Rust library is correct in returningis_done
as true after 200 steps. However I'm not observing the same result in the original Python library. Considering that this Rust library is supposed to be a frontend to the Python library, I'd argue it should mimic its results, even if incorrect.Minimum working examples showing the difference:
Python
Rust