Closed WH1123 closed 1 year ago
I'm sorry I didn't add train.py and main.py to this repo, the goal of which is just to present how PPO could be implemented in hybrid action space. Moreover, I think it's not so difficult to write a train.py or main.py yourself based on other excellent works, like https://github.com/openai/spinningup.
Thank you.
if you have an example of training and testing ,your code will help more people!
Thank you. Your email is received and will be handled as soon as possible.
Thank you .Would you like to give me an example of training code ?
Thank you. Your email is received and will be handled as soon as possible.
I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.
你的意思是这个代码有问题,不能用?
@.*** | |
---|---|
@.*** |
---- Replied Message ---- | From | @.> | | Date | 06/27/2023 10:30 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [Metro1998/hppo-in-traffic-signal-control] How to use this code? (Issue #1) |
I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Thank you. Your email is received and will be handled as soon as possible.
I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.
Thank you for your reply. You mean this code still has some problems, can't it be used?
I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.
Thank you for your reply. You mean this code still has some problems, can't it be used?
It works out any way. Emmmm, but the main difference i think is the trick of early stopping. In openai spinning up when approx kl reaches its threshold, it breack at onece, but in my implementation, it will go on to check whether the next minibatch will satisfy the kl threshold. Moreover the main.py is so problem-oriented(for the DTDE structure) that i recommend you to rewrite one for your own manipulation.
Thank you for sharing. How to train and test, I can't find train.py or main.py, please help me.