ZuseZ4 / Rust_RL

A Reinforcement Learning / Neural Network library, written in Rust.
19 stars 2 forks source link

Can 'Environment' and 'Agent' traits contain generic 'State' and 'Action'? #6

Open seongs1024 opened 3 years ago

seongs1024 commented 3 years ago

To be applied to other problems, the implementation of generic (and the associated type?) 'State' and 'Action' would be very helpful. Found some references in the other repository. the repo: milanboers/rurel

Frankly, I want to try to convert current 'Environment' and 'Agent' traits to deal with more generic problems (beyond Fortress) though, have no idea where to start... Hope to hear your advice.

Thank you in advance!

ZuseZ4 commented 3 years ago

Hi and thanks for your feedback. I have been thinking about having more generic traits, but I haven't decided on how to do so yet.

Two basic thoughts about changes in that direction.

1) Ndarray is the currently used array backend. The entire NeuralNetwork section of this repo works on ndarray and I see no other matrix crate which fits better for ml. So in my opinion further changes and implementations should be based on ArrayD (or a specific ArrayX Version), in order to stay compatible.

2) I like https://gym.openai.com/docs/#spaces Having spaces sounds good to me. With just the discrete space and the box space we should cover already most of the usecases.

Changing interfaces towards the openAI Gym is just my personal preference atm, but please feel free to come up with another solution.

Let me lay out some changes which would arise when following their space idea:

Fortress would have an action space of type Discrete(36) and an observation space of type Box((6,6)), where each entry is a value in the range [-3,3].

The TicTacToe example would have an action space of type Discrete(9), and a observation space of type Box((3,3)), where each value is in the range [-1,1].

If we take ContinuousMountainCar as a new example we would than have an action space of type Box(1), with a value in the range [-1,1], for full power backwards/forwards. The observation space would be of type Box(1), with a value in the range [-1.2, 0.6]. See https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py This would be an example which can't be handled well with the current implementation.

Generally speaking, the Environment trait would be based on two generics, the action space A and the observation space O. Both have to be a space, for now ether Discrete(usize), or Box(..). The step() and take_action would then be based on them.

The Agent_trait and Trainer would need to be changed accordingly.

If you are fine with the changes I proposed, I would vote for having a first iteration, where we focus on the two spaces discrete and box, ignoring custom spaces which user might want to add later. That could remain for a smaller second iteration. I could than define a precise layout for agent_trait.rs, env_trait.rs (+ something about trainer.rs). The discrete space is clear, by just having n: usize. The box space would probably be based on the triple (shape, low, high), where shape could be an ndarray shape and low/high f32 values (including either NaNs for +- \infty, or wrapping them with an option).

Right now I'm quite advanced with another branch which allows passing batches to NeuralNetworks. That affects the replay_buffer in the RL part of my repo, but once I got that merged I can create a Draft PR, where we can discuss the implementation.

seongs1024 commented 3 years ago

Changing interfaces towards the OpenAI Gym sounds good!

For your information, there is a reference using the concept of both discrete and box spaces. (ref. tspooner/spaces.) The owner implemented cart_pole, mountain_car (for both continuous and discrete action), and cliff_walk domains for testing reinforcement learning algorithms.

It is an option to use spaces crate or implement both discrete and box spaces.

ZuseZ4 commented 3 years ago

I started by creating a Draft-PR, in order to use tspooner's repository in the following places: src/examples/fortress.rs src/examples/tictactoe.rs src/rl/training/trainer.rs src/rl/agents src/rl/env

I'm just trying to add a generic type state space and an action space to the relevant structs/traits, while keeping ti compiling. Afterwards I will work on using them instead of the existing ndarray's

ZuseZ4 commented 3 years ago

I've tried to transfer my code partially to Spaces. However I haven't had any success, my impression is that it would be easier to transfer dql_agent, ql_agent, fortress, and tictactoe to rsrl rather than adding spaces to my implementation. Unfortunately, this is beyond the time I can invest atm. Therefore I might come back later and implement something like generics on top of ndarray::ArrayBase, once I have more time.

I've pushed the probably closest attempt in case that you want to have a look at it. There I tried to just update the get_move function from Agent, as well as the take_action from Environment. However, I have no clue how to get my agents working with generic, not usize based types. Spezialisations (https://github.com/rust-lang/rust/issues/31844) might help by marking some agent's (dql_agent) as just useful for some Spaces, however that is far from being stabilized. There might be other solutions, but I'm not experienced enough with generics to see them. Feel free to play around with that or another spaces based solution, otherwise I will start another draft based closer on ndarray.

seongs1024 commented 3 years ago

Okay 👍 I'm trying to make them generics using spaces, and will come back if I have something to tell you!

Thank you @ZuseZ4 !