When training over the long duration it seems like even though the training loss is decreasing the the open loop loss and report loss continuously increases after bottoming out (see the attached graph).
Similar pattern was observed for dynamics loss for all experiments
It this correct to interpret that this just means that the model is over fitting to the existing data in the replay buffer while under fitting to new data and hence the model is not general enough?
I do see that the reward score is still improving (see screenshot below), but I assume that might be because of the SAC policy improving (see actor loss screenshot below)
Similar behavior of divergence between train and openloop was observed in critic loss as well
You can choose a run script with eval metrics to diagnose overfitting. Losses going up is normal when the agent explores the environment further and sees more diverse data over time.
When training over the long duration it seems like even though the training loss is decreasing the the open loop loss and report loss continuously increases after bottoming out (see the attached graph).
Similar pattern was observed for dynamics loss for all experiments
It this correct to interpret that this just means that the model is over fitting to the existing data in the replay buffer while under fitting to new data and hence the model is not general enough? I do see that the reward score is still improving (see screenshot below), but I assume that might be because of the SAC policy improving (see actor loss screenshot below)
Similar behavior of divergence between train and openloop was observed in critic loss as well
My questions are
Would love to know your thoughts on this.