Open chaosprint opened 2 years ago
I'm excited to have discovered this! For this reward design, I'm also willing to help the environment.
Perhaps this is a multi-agent RL problem, and fitting all of the slots can be depicted as arriving at correct targets in a conventional environment known as a multi-agent practical environment (MPE). The MPE reward might be referred to throughout the reward design process.
I'm excited to have discovered this! For this reward design, I'm also willing to help the environment.
Perhaps this is a multi-agent RL problem, and fitting all of the slots can be depicted as arriving at correct targets in a conventional environment known as a multi-agent practical environment (MPE). The MPE reward might be referred to throughout the reward design process.
Cool. Thank you for your input. I need to finish another project before the deadline this month. Will revisit this issue in May.
great!
So far, this environment seems to be designed for training a "synth parameter tweaking" agent.
There are some other RL applications. For example, we can give 1024 slots to the sequencer and see if the agent can learn to pick the correct place to play a human-acceptable drum sequencer (mainly switch on 16th note slot). But how do we set the reward here? And how should Glicol be improved for this?