Preventing over fitting in reconstruction and dynamics loss

When training over the long duration it seems like even though the training loss is decreasing the the open loop loss and report loss continuously increases after bottoming out (see the attached graph).

train_report_camera_loss

Similar pattern was observed for dynamics loss for all experiments

dyn_loss

It this correct to interpret that this just means that the model is over fitting to the existing data in the replay buffer while under fitting to new data and hence the model is not general enough? I do see that the reward score is still improving (see screenshot below), but I assume that might be because of the SAC policy improving (see actor loss screenshot below) loss_with_
![actor_loss](https://github.com/user-attachments/assets/908e304b-cb19-4dc2-81dd-20b98dc2d6cc)
score

Similar behavior of divergence between train and openloop was observed in critic loss as well critic_loss

My questions are

Is this interpretation of over fitting correct?
Does this lead to less sample efficiency?
Any strategies to mitigate it?

Would love to know your thoughts on this.

danijar / dreamerv3

Preventing over fitting in reconstruction and dynamics loss #150