Not able to get a converged reward for interest_evolution env and full_slate_q_agent

efotopoulou commented 4 years ago

I am trying to get familiarized with recsim. I want to test the full_slate_q_agent upon the interest_evolution but the agent does not seem to learn something. random_agent performs same as full_slate_q_agent. The command i execute is the usage example from https://pypi.org/project/recsim/ with the only difference that num_iterations are set to 100 instead of 10.

python main.py --logtostderr --base_dir="/tmp/recsim/interest_evolution_full_slate_q" --agent_name=full_slate_q --environment_name=interest_evolution --episode_log_file='episode_logs.tfrecord' --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=100 --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5

Am i doing something wrong? Which set up should i use so as to explore the performance of full_slate_q_agent ???

I also attach the tensorboard figure with the training results: 2020-01-31

Thanks a lot for you time and help and many congratulation for recsim. Seems to be a very potencial and promising lib :-)

cwhsu-google commented 4 years ago

The experiments in https://arxiv.org/pdf/1905.12767.pdf were done using 300K training steps. You need to increase num_iterations as training_steps = num_iterations * max_training_steps. You also want to increase max_eval_epsiodes and the paper reports results with 5000 simulated users. Thanks.

On Fri, Jan 31, 2020 at 8:31 AM efotopoulou notifications@github.com wrote:

I am trying to get familiarized with recsim. I want to test the full_slate_q_agent upon the interest_evolution but the agent does not seem to learn something. random_agent performs same as full_slate_q_agent. The command i execute is the usage example from https://pypi.org/project/recsim/ with the only difference that num_iterations are set to 100 instead of 10.

python main.py --logtostderr --base_dir="/tmp/recsim/interest_evolution_full_slate_q" --agent_name=full_slate_q --environment_name=interest_evolution --episode_log_file='episode_logs.tfrecord' --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=100 --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5

Am i doing something wrong? Which set up should i use so as to explore the performance of full_slate_q_agent ???

I also attach the tensorboard figure with the training results: [image: 2020-01-31] https://user-images.githubusercontent.com/1914256/73556232-b489e180-4457-11ea-9be4-1e28e68ce780.png

Thanks a lot for you time and help and many congratulation for recsim. Seems to be a very potencial and promising lib :-)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/recsim/issues/11?email_source=notifications&email_token=ALVSYDUU27MZNU4BJYRJ27TRARG77A5CNFSM4KOJZOGKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IKGABYQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALVSYDVLDRNRKR2CQUC7U4DRARG77ANCNFSM4KOJZOGA .

efotopoulou commented 4 years ago

Thank you very much for your help! I found all relevant info at https://arxiv.org/pdf/1905.12767.pdf

getsanjeevdubey commented 3 years ago

@cwhsu-google @efotopoulou Though my issue seems very less related to this, since i am trying to train it on a custom dataset. Despite running it for 300k, I am not able to get a converged reward plot. To check everything is going well, I am trying to see if q values are getting converged or not over different iterations. My settings are: slate_size = 100 user & doc vector are of length 145 each agent = slate decomposition agent (full slate q agent cannot be used since action space is large, which slateQ agent solves for)

I have trained the model for a large no of training_steps and saved checkpoint for each iteration
I am passing a constant user vector and doing inference on different checkpoints
My expectation is that: q values of each document (100 in total) for given state (fixed user_vector) should evolve and converge in last few iterations
Unfortunately I am getting same q value vector for all the iterations.

[0.209816247 0.672948897 0.138823986 -0.381417215 0.203283459 -0.157474607 0.558498859 0.0357285365 -0.0832432359 -0.306503236 -0.0795367435 0.277975351 0.0103795044 0.197861522 -0.463964254 0.0460627973 -0.691076517 0.218603089 -0.447111547 -0.0333644077 -0.439966202 -0.283281863 -0.013531629 0.50945431 -0.705919266 0.25665611 0.029182218 -0.409537733 -0.116072506 -0.0328169651 -0.1983722 0.324900597 0.0853225738 -0.114950635 0.28221336 0.0611787699 -0.0344685763 0.357370555 -0.589705944 -0.24937807 -0.0080305 0.79175508 -0.167672127 0.0365265608 -0.436599731 0.153021529 -0.133471429 -0.629660368 -0.0871134251 -0.103498258 -0.149827451 0.378510296 0.868401885 -0.598987341 0.160983115 0.261486322 0.653617144 0.112870656 0.335867763 -0.0944061354 0.337530851 -0.296513 0.517484665 -0.43780762 0.0970332175 -0.37353915 0.337981075 -0.332032353 1.14180565 0.570612311 -0.359180361 0.439269572 0.180384859 -0.489291698 0.346548557 -0.454834551 -0.17369175 0.208824292 -0.366948724 -0.018982742 0.0682527125 0.0262328871 0.379669249 -0.179537177 -0.254400492 -0.57315886 -0.226627722 0.351775855 0.15934965 0.549802065 0.0397418 -0.379078835 0.496997416 -0.111561865 -0.191034168 -0.224825352 0.360356957 -0.0856648833 0.132795379 0.629708946]

Any help on what might be going wrong, or what else may I check to fix this?

Let me know if you need any other information, thanks in advance.

getsanjeevdubey commented 3 years ago

Just to ensure if network is being trained and sync between online and target network is happening, I have even tried taking these values to be very less at https://github.com/google-research/recsim/blob/master/recsim/agents/dopamine/dqn_agent.py#L186 min_replay_history=100, target_update_period=50

getsanjeevdubey commented 3 years ago

@cwhsu-google Any help would be appreciated.

ww-Jingle commented 2 years ago

Thank you very much for your help! I found all relevant info at https://arxiv.org/pdf/1905.12767.pdf

Hi, may I know if you finally get a converged reward by using full_slate_q_leanring? I'm currently working on this, but I cannot get a converged reward in the interest_evolution environment no matter how the large number of training steps I trained.

google-research / recsim

Not able to get a converged reward for interest_evolution env and full_slate_q_agent #11