The learning experience at the unoccupied time

YangyangFu / mpc-drl-tl

A paper plan for comparing MPC and DRL/TL in building control

1 stars 0 forks source link

I suggest that you can set the length of each epoch to occupied time only if you want to discard the training at the unoccupied time. (We may need further discussion on this after the holiday. Basically, it is better if we can set our training environment to different dates at the beginning of each epoch. ) And the behavior at the unoccupied time can also be viewed as part of the system dynamic. If you notice that we have included the time as part of the system states, then the agent should be able to learn that behavior. We didn't discard that part just for convenience. If you want to discard that, you need to consider what is the system state the next day. If you set it manually, it's the same as "set the length of each training epoch to occupied time" as I mentioned.

YangyangFu / mpc-drl-tl

The learning experience at the unoccupied time #94