Open Yang6852 opened 6 months ago
In agents/ql_diffusion.py 144-147 if np.random.uniform() > 0.5: q_loss = - q1_new_action.mean() / q2_new_action.abs().mean().detach() else: q_loss = - q2_new_action.mean() / q1_new_action.abs().mean().detach()
if np.random.uniform() > 0.5:
q_loss = - q1_new_action.mean() / q2_new_action.abs().mean().detach()
else:
q_loss = - q2_new_action.mean() / q1_new_action.abs().mean().detach()
I'm sorry I could't understand why q_loss is calculated this way.I would be very grateful if you could explain a little.
In agents/ql_diffusion.py 144-147
if np.random.uniform() > 0.5:
q_loss = - q1_new_action.mean() / q2_new_action.abs().mean().detach()
else:
q_loss = - q2_new_action.mean() / q1_new_action.abs().mean().detach()
I'm sorry I could't understand why q_loss is calculated this way.I would be very grateful if you could explain a little.