Open surajitsaikia27 opened 3 years ago
Can you tell me which TensorFlow version you are using? I think this is happening because calculated losses are not updating the Actor-network. I am also facing the same issue using TensorFlow 2.x versions. But I encountered one solution i.e, manually updating weights of the network with GradientTape in TensorFlow.
I am using Tensorflow 2.3. So after updating the weights using GradientTape your agent is working well?
I haven't applied, because I didn't found any concrete solution with GradientTape with respect to the implementation of the PPO algorithm. So currently I am using ML-Agents to training the environment but I am thinking to update the code.
Another enhancement required is to handle multiple agents in a single environment and training parallel environments.
That would be great. I am able to train agents in open ai gym using PPO without issues, but I don't know why the same code is not working in Unity.
Great to hear that it's working with gym environments. Can you tell me more about the Unity environment? For example, Is it multi-agent? or are you trying to train multiple copies of the environment?
it is the reacher environment in Unity 3D. It has a multi-agents. You can try it out :)
Currently, this repo doesn't support the multi-agent environment. So I think this might be an issue. I will create a TODO section in README mentioning all the enhancements required for this repo.
Similar to this work I was creating my Unity platform and training using python API. Somehow, my agents are not training well so I turned up here. I tried to run your code but I am getting a mean score of 0.001 every time. Did it happen to you?