Closed SigmaBM closed 1 year ago
Hi, sorry for the late reply!
I'm sorry, and my code was the old version.
Thanks for letting me know, and I updated explore loss in my code.
If there exists more issues, please let me know :)
Thank you for the update!
Now the exploration loss is calculated using target q value, so the gradient won't propagate to q function. Actually, I ran the new released code and found it showed the same result as the old code. Should target q value here be q value from mac_out?
explore_loss_subset in maser_q_learner.py, i.e., episodic correction losses in paper.