chz056 / BEAR

24 stars 4 forks source link

RL policies #9

Open paradox-ljt123 opened 2 months ago

paradox-ljt123 commented 2 months ago

Dear developers, Thank you very much for your BEAR which makes me have data and environment to learn. I only see MPCAgent inside the controller. I just started to learn reinforcement learning, so I hope I can get the code of gym environment for SAC and PPO combined with BEAR. My email: jiatongl84@gmail.com Thank you very much!

chz056 commented 2 months ago

Hi,

Thank you very much for using our work.

The MPC inside the controller was developed by us, while PPO and SAC were implemented using the Stable-Baselines3 package https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html. You can find an example of how to use them in the last section of the QuickStart.ipynb https://github.com/chz056/BEAR/blob/main/BEAR/QuickStart.ipynb file.

Let me know if you have any further questions.

Best regards, Chi

On Tue, Sep 3, 2024 at 5:16 AM paradox-ljt123 @.***> wrote:

Dear developers, Thank you very much for your BEAR which makes me have data and environment to learn. I only see MPCAgent inside the controller. I just started to learn reinforcement learning, so I hope I can get the code of gym environment for SAC and PPO combined with BEAR. My email: @.*** Thank you very much!

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chz056/BEAR/issues/9__;!!Mih3wA!GBbLhUNpHCdu4br5SsxtBbXwn3YeFzXOTlJnPihZVd56rzeAMVL3Cxi1Xj7L4p5YDk9X8xQbH0VYOS9miMisSF7pTg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AW35WM5BZ5GSDDUWRCSDX5LZUWSA3AVCNFSM6AAAAABNSACO5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYDENZSGQYDKMA__;!!Mih3wA!GBbLhUNpHCdu4br5SsxtBbXwn3YeFzXOTlJnPihZVd56rzeAMVL3Cxi1Xj7L4p5YDk9X8xQbH0VYOS9miMgCvuYdbA$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

paradox-ljt123 commented 2 months ago

Wow, thank you very much!

paradox-ljt123 commented 1 month ago

Dear developers, I have another problem. I changed activity_sch to a random number in the ParameterGenerator function inside Utils, but I don't know why it doesn't have any effect on the temp.

1 3 2

Thank you very much!

chz056 commented 1 month ago

Hi,

Thank you for reaching out. I think there could be a few possible causes. One idea that comes to my mind is that the full_occ variable in the ParameterGenerator function (inside Utils) might not have been adjusted. That variable sets the maximum occupancy of the zone, and by default, it's set to 0. It might help to try modifying that first. Let me know if that makes any difference!

Best regards,

Chi

On Mon, Sep 9, 2024 at 8:47 PM paradox-ljt123 @.***> wrote:

Dear developers, I have another problem. I changed activity_sch to a random number in the ParameterGenerator function inside Utils, but I don't know why it doesn't have any effect on the temp. 1.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/6545cbba-b66d-491a-a4f0-af8d0733ff1b__;!!Mih3wA!FPTfhvxF7P9ufSp8FgnLlXO7T5RJkMY5oJHWj5F1_G39Wrcpa2LVK_5gNzq5i5UCAqaDoeHMdRNYUkqGZd-nYIqCjQ$ 3.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/472dc7b3-06f5-4188-b9f5-14f090500292__;!!Mih3wA!FPTfhvxF7P9ufSp8FgnLlXO7T5RJkMY5oJHWj5F1_G39Wrcpa2LVK_5gNzq5i5UCAqaDoeHMdRNYUkqGZd84nEc3Yg$ 2.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/6be0aa67-87a1-4ba8-a4a8-3b41e960f253__;!!Mih3wA!FPTfhvxF7P9ufSp8FgnLlXO7T5RJkMY5oJHWj5F1_G39Wrcpa2LVK_5gNzq5i5UCAqaDoeHMdRNYUkqGZd80bjJN9w$ Thank you very much!

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chz056/BEAR/issues/9*issuecomment-2339550215__;Iw!!Mih3wA!FPTfhvxF7P9ufSp8FgnLlXO7T5RJkMY5oJHWj5F1_G39Wrcpa2LVK_5gNzq5i5UCAqaDoeHMdRNYUkqGZd-2t24_pw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AW35WM7QT3DZXQMP6ZU7NJLZVZTWBAVCNFSM6AAAAABNSACO5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGU2TAMRRGU__;!!Mih3wA!FPTfhvxF7P9ufSp8FgnLlXO7T5RJkMY5oJHWj5F1_G39Wrcpa2LVK_5gNzq5i5UCAqaDoeHMdRNYUkqGZd-fTcZVPw$ . You are receiving this because you commented.Message ID: @.***>

paradox-ljt123 commented 1 month ago

Ok.Thank you! Sorry, I'm just getting started with RL. I don't quite know how to adjust the value of full_occ. For example, if I set this value very large (and tried less than the minimum), it always results in obs as NAN. image

paradox-ljt123 commented 1 month ago

Sometimes nan will not occur, and MPC can get return. But ppo will report an error, tensor([[nan, nan, nan, nan, nan, nan, nan]], unless full_occ is 0.

chz056 commented 1 month ago

Full_occ represents the maximum number of people in each zone. You may want to experiment with smaller values (e.g., between 0 and 10). The shape of full_occ can either be a single value (indicating the same occupancy for all rooms) or an array with different values for each room. For example, in a 5-zone building, you could set full_occ = np.array([3, 5, 5, 10, 3]).

Best, Chi

On Tue, Sep 10, 2024 at 2:02 AM paradox-ljt123 @.***> wrote:

Sometimes nan will not occur, and MPC can get return. But ppo will report an error, tensor([[nan, nan, nan, nan, nan, nan, nan]], unless full_occ is 0.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chz056/BEAR/issues/9*issuecomment-2340084890__;Iw!!Mih3wA!He2TL0Bq3iSyabFVc7cK8GwUKpSjhWsE1kg_uA7MAwinvqX3ipoSMFEwYy8tXYoyIkImLw7SrC2t9V2-KfVlxdQcDQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AW35WM2NZWMJRMSVDIHJ7QLZV2YR3AVCNFSM6AAAAABNSACO5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBQGA4DIOBZGA__;!!Mih3wA!He2TL0Bq3iSyabFVc7cK8GwUKpSjhWsE1kg_uA7MAwinvqX3ipoSMFEwYy8tXYoyIkImLw7SrC2t9V2-KfUvSyuBWQ$ . You are receiving this because you commented.Message ID: @.***>