To use trained onnx file to make agents play soccer

jihoon-seo commented 1 year ago

@bryanoliveira Hello, I have been digging the Soccer Two example in Unity ML-Agents project these days, and found this repo. It seems that this soccer-twos-env is a nice project as a starting point to dig more.

While doing some tests, I got some questions. Could you check and provide some answers if possible?

I want to use trained onnx file (SoccerTwos.onnx) to make agents play soccer.

When I run a training session using the example code in README.md,

import soccer_twos

env = soccer_twos.make(render=True)
print("Observation Space: ", env.observation_space.shape)
print("Action Space: ", env.action_space.shape)

team0_reward = 0
team1_reward = 0
while True:
    obs, reward, done, info = env.step(
        {
            0: env.action_space.sample(),
            1: env.action_space.sample(),
            2: env.action_space.sample(),
            3: env.action_space.sample(),
        }
    )

    team0_reward += reward[0] + reward[1]
    team1_reward += reward[2] + reward[3]
    if done["__all__"]:
        print("Total Reward: ", team0_reward, " x ", team1_reward)
        team0_reward = 0
        team1_reward = 0
        env.reset()

I was able to see that, each action vector which is generated from env.action_space.sample() is in a form of array([n1, n2, n3])

where

the first digit: for forwardAxis
- 1 means 'move forward`
- 2 means 'move backward'
- 0 means 'don't move, in terms of forward axis'
the second digit: for rightAxis
- 1 means 'move left`
- 2 means 'move right'
- 0 means 'don't move, in terms of right axis'
the third digit: for rotateAxis
- 1 means 'rotate clockwise`
- 2 means 'rotate counterclockwise`
- 0 means 'don't rotate'

But the neural network model(SoccerTwos.onnx)'s input(vector_observation) dim is 336 and output(discrete_actions or action) dim is 9, so I got len(9) arrays as the output of inference.

So my question in here is that: how can I convert the output of inference, whose shape is (9,) for each player, to an appropriate discrete action tuple, whose shape is (3,)?

I tried to find the part that handles this conversion in this repo and Unity-Technologies/ml-agents, but no luck.

The 2nd question is that: could I get an complete & working codes, including the trained model file and the implemented agent module which is a subclass of soccer_twos.AgentInterface, to be able to watch the agents playing soccer with the trained model?

Again, thank you for your excellent work. Have a nice weekend!

bryanoliveira commented 1 year ago

Hi @jihoon-seo! Thank you for using this package.

For your first question, I believe the outputs from your trained model may be the Q-values for each axis (3) action/direction (3) for each team player (len(pred1) = 2). If this is the case, you could convert the output of inference by doing an argmax for each action slice, e.g.: action1 = np.array(np.argmax(pred1[0][:3]), np.argmax(pred1[0][3:6]), np.argmax(pred1[0][6:9])).

For your second question, we have a starter kit made with Ray RLlib here. There you'll find a trained baseline, Ray training configs, and agent interfaces.

Good luck with your project and have a lovely weekend too!

jihoon-seo commented 1 year ago

@bryanoliveira Thank you for your kind answer! Thanks to your detailed explanation, I was able to write a new agent module that uses the pre-trained ONNX file. You (and anyone, of course) can check the code here.

And from the trained baseline you mentioned, I was able to find how to load the Ray trained checkpoint and then use as a brain. This will be so helpful to me as well.

Thank you for writing and maintaining this well-working project.

And if you prefer, please feel free to close this issue since my questions are now resolved. 😊 (Or we might keep this open for certain period, as someone else might notice this open issue and get some help maybe..? 😊)

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bryanoliveira / soccer-twos-env

To use trained onnx file to make agents play soccer #7