YangyangFu / mpc-drl-tl

A paper plan for comparing MPC and DRL/TL in building control
1 stars 0 forks source link

The learning experience at the unoccupied time #94

Open terrancelu92 opened 2 years ago

terrancelu92 commented 2 years ago

I wonder if the learning performance will be improved if the experience at the unoccupied time should be discarded and not added to the buffer. The reward at the unoccupied time is zero no matter what action was made while the reward at the occupied is negative. Will the agent consider the reward at the unoccupied time as the positive reward which causes the noise to the learning process? How will this influence the off-policy and on-policy respectively? Did you consider this in your single-zone model? @shichao2023

shichao2023 commented 2 years ago

I suggest that you can set the length of each epoch to occupied time only if you want to discard the training at the unoccupied time. (We may need further discussion on this after the holiday. Basically, it is better if we can set our training environment to different dates at the beginning of each epoch. ) And the behavior at the unoccupied time can also be viewed as part of the system dynamic. If you notice that we have included the time as part of the system states, then the agent should be able to learn that behavior. We didn't discard that part just for convenience. If you want to discard that, you need to consider what is the system state the next day. If you set it manually, it's the same as "set the length of each training epoch to occupied time" as I mentioned.