Jianxun-Wang / PIMBRL

Physics-informed Dyna-style model-based deep reinforcement learning for dynamic control
MIT License
45 stars 10 forks source link

Some code details #2

Closed BestPolarBear closed 2 years ago

BestPolarBear commented 2 years ago

Nice to read your article. I have a few questions for you, mainly in the code. 1、Is it necessary to put self. phyloss_ Flag to true ? image

2、At CartPole-v0 and Pendulum-v0 environment, do you use physical loss Le?

Thank you !

Xin-yang-Liu commented 2 years ago

Thanks for your interest, our code is general enough to handle all the three methods mentioned in the paper: MFRL, MBRL & PiMBRL.

Q1:

When variable self. phyloss_ Flag is set to True, physics informed RL (PiMBRL) will be trained. Otherwise, only baseline MBRL (when usemodel=True) or MFRL (usemodel=False) will be performed.

Q2:

We tested PiMBRL in every environment shown in our paper, including the CartPole-v0. However, for the Pendulum case, we use a slightly different observation space, compared to the standard Pendulum-v0, for the ease of implementing physics loss.

BestPolarBear commented 2 years ago

hello,@Xin-yang-Liu ,Thank you for your reply. I still don't understand here. In the KS experiment, do you use real_ buffer to train the transition model with physical loss? image image

Xin-yang-Liu commented 2 years ago

So for the data in self.RL.buffer, most of them are generated by the model which contains the model error. This means the data in this buffer cannot be used as the label. But, when using physics-informed MBRL for these data points, equation loss still can be used to improve the learned model.

However, it is inevitable to use extra epochs to train the model with equation loss. To exclude the possibility that the extra epochs rather than the equation loss improves the model quality, we add the same amount of training epochs (actually the same epoch for each data point because there are far fewer data in real buffer than in self.RL.buffer) to the purely data-driven model in MBRL, which can only be trained with the data from the real buffer (because only real/true data are stored there) (This is what line 107-109 doing).

Hope this answers your question.