Metro1998 / hppo-in-traffic-signal-control

34 stars 2 forks source link

How to use this code? #1

Closed WH1123 closed 1 year ago

WH1123 commented 1 year ago

Thank you for sharing. How to train and test, I can't find train.py or main.py, please help me.

Metro1998 commented 1 year ago

I'm sorry I didn't add train.py and main.py to this repo, the goal of which is just to present how PPO could be implemented in hybrid action space. Moreover, I think it's not so difficult to write a train.py or main.py yourself based on other excellent works, like https://github.com/openai/spinningup.

WH1123 commented 1 year ago

Thank you.

hugo921 commented 1 year ago

if you have an example of training and testing ,your code will help more people!

WH1123 commented 1 year ago

Thank you. Your email is received and will be handled  as soon as possible.

hugo921 commented 1 year ago

Thank you .Would you like to give me an example of training code ?

Thank you. Your email is received and will be handled  as soon as possible.

Metro1998 commented 1 year ago

I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.

hugo921 commented 1 year ago

你的意思是这个代码有问题,不能用?

@.***

@.*** |

---- Replied Message ---- | From | @.> | | Date | 06/27/2023 10:30 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [Metro1998/hppo-in-traffic-signal-control] How to use this code? (Issue #1) |

I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

WH1123 commented 1 year ago

Thank you. Your email is received and will be handled  as soon as possible.

hugo921 commented 12 months ago

I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.

Thank you for your reply. You mean this code still has some problems, can't it be used?

Metro1998 commented 12 months ago

I apologize for not having much time to maintain this repository recently. The main purpose of this repository is to roughly demonstrate the implementation of H-PPO, although there are some issues with the PPO update process. I have also uploaded train.py and the SUMOEnv.py we use for you to understand how we interact with the environment. However, the problem still exists, and you may still not be able to train our algorithm. You can write your own main to rollout, update, and evaluate. Of course, you can also follow our working repository - "when it comes". In this repository, we are implementing a MARL-based H-PPO, and we will update the entire training and evaluation process soon, but it may take some time. Moreover, there are some materials recommended for you https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/, just for a better PPO.

Thank you for your reply. You mean this code still has some problems, can't it be used?

It works out any way. Emmmm, but the main difference i think is the trick of early stopping. In openai spinning up when approx kl reaches its threshold, it breack at onece, but in my implementation, it will go on to check whether the next minibatch will satisfy the kl threshold. Moreover the main.py is so problem-oriented(for the DTDE structure) that i recommend you to rewrite one for your own manipulation.