Closed jihoon-seo closed 1 year ago
Hi @jihoon-seo! Thank you for using this package.
For your first question, I believe the outputs from your trained model may be the Q-values for each axis (3) action/direction (3) for each team player (len(pred1) = 2). If this is the case, you could convert the output of inference by doing an argmax for each action slice, e.g.: action1 = np.array(np.argmax(pred1[0][:3]), np.argmax(pred1[0][3:6]), np.argmax(pred1[0][6:9]))
.
For your second question, we have a starter kit made with Ray RLlib here. There you'll find a trained baseline, Ray training configs, and agent interfaces.
Good luck with your project and have a lovely weekend too!
@bryanoliveira Thank you for your kind answer! Thanks to your detailed explanation, I was able to write a new agent module that uses the pre-trained ONNX file. You (and anyone, of course) can check the code here.
And from the trained baseline you mentioned, I was able to find how to load the Ray trained checkpoint and then use as a brain. This will be so helpful to me as well.
Thank you for writing and maintaining this well-working project.
And if you prefer, please feel free to close this issue since my questions are now resolved. 😊 (Or we might keep this open for certain period, as someone else might notice this open issue and get some help maybe..? 😊)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@bryanoliveira Hello, I have been digging the Soccer Two example in Unity ML-Agents project these days, and found this repo. It seems that this
soccer-twos-env
is a nice project as a starting point to dig more.While doing some tests, I got some questions. Could you check and provide some answers if possible?
I want to use trained onnx file (
SoccerTwos.onnx
) to make agents play soccer.When I run a training session using the example code in
README.md
,I was able to see that, each action vector which is generated from
env.action_space.sample()
is in a form ofarray([n1, n2, n3])
where
1
means 'move forward`2
means 'move backward'0
means 'don't move, in terms of forward axis'1
means 'move left`2
means 'move right'0
means 'don't move, in terms of right axis'1
means 'rotate clockwise`2
means 'rotate counterclockwise`0
means 'don't rotate'But the neural network model(
SoccerTwos.onnx
)'s input(vector_observation
) dim is 336 and output(discrete_actions
oraction
) dim is 9, so I gotlen(9)
arrays as the output of inference.So my question in here is that: how can I convert the output of inference, whose shape is
(9,)
for each player, to an appropriate discrete action tuple, whose shape is(3,)
?I tried to find the part that handles this conversion in this repo and
Unity-Technologies/ml-agents
, but no luck.The 2nd question is that: could I get an complete & working codes, including the trained model file and the implemented agent module which is a subclass of
soccer_twos.AgentInterface
, to be able to watch the agents playing soccer with the trained model?Again, thank you for your excellent work. Have a nice weekend!