Clarification on actor-critic training

Thanks for the paper, it is really cool and useful

On page 22 of the paper, it says

For reincarnating D4PG using QDagger, we minimize a distillation loss between the D4PG’s actor policy and the teacher policy from TD3 jointly with the actor-critic losses.

Is QDagger loss equal to the actor loss + critic loss + distillation loss for the actor policy (but not the distillation loss for the critic) for a given sample from the replay buffer? If so, what critic are you for training the actor in the offline training stage? It would seem that if use the student critic then you will get a "bad" critic at the begin that might mess up the agent rather that the teacher's critic. This doesn't seem to be specified anywhere.

Finally, thank you for open sourcing your code however I can't see the code for the TD3 -> D4PG, am I missing it or has that not been open sourced?

google-research / reincarnating_rl

Clarification on actor-critic training #3