Open rahul-zomato opened 2 years ago
Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230
SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - https://github.com/google-research/recsim/issues/26
Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230
SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - https://github.com/google-research/recsim/issues/26