Closed efotopoulou closed 4 years ago
The experiments in https://arxiv.org/pdf/1905.12767.pdf were done using 300K training steps. You need to increase num_iterations as training_steps = num_iterations * max_training_steps. You also want to increase max_eval_epsiodes and the paper reports results with 5000 simulated users. Thanks.
On Fri, Jan 31, 2020 at 8:31 AM efotopoulou notifications@github.com wrote:
I am trying to get familiarized with recsim. I want to test the full_slate_q_agent upon the interest_evolution but the agent does not seem to learn something. random_agent performs same as full_slate_q_agent. The command i execute is the usage example from https://pypi.org/project/recsim/ with the only difference that num_iterations are set to 100 instead of 10.
python main.py --logtostderr --base_dir="/tmp/recsim/interest_evolution_full_slate_q" --agent_name=full_slate_q --environment_name=interest_evolution --episode_log_file='episode_logs.tfrecord' --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=100 --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5
Am i doing something wrong? Which set up should i use so as to explore the performance of full_slate_q_agent ???
I also attach the tensorboard figure with the training results: [image: 2020-01-31] https://user-images.githubusercontent.com/1914256/73556232-b489e180-4457-11ea-9be4-1e28e68ce780.png
Thanks a lot for you time and help and many congratulation for recsim. Seems to be a very potencial and promising lib :-)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/recsim/issues/11?email_source=notifications&email_token=ALVSYDUU27MZNU4BJYRJ27TRARG77A5CNFSM4KOJZOGKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IKGABYQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALVSYDVLDRNRKR2CQUC7U4DRARG77ANCNFSM4KOJZOGA .
Thank you very much for your help! I found all relevant info at https://arxiv.org/pdf/1905.12767.pdf
@cwhsu-google @efotopoulou Though my issue seems very less related to this, since i am trying to train it on a custom dataset. Despite running it for 300k, I am not able to get a converged reward plot. To check everything is going well, I am trying to see if q values are getting converged or not over different iterations. My settings are: slate_size = 100 user & doc vector are of length 145 each agent = slate decomposition agent (full slate q agent cannot be used since action space is large, which slateQ agent solves for)
[0.209816247 0.672948897 0.138823986 -0.381417215 0.203283459 -0.157474607 0.558498859 0.0357285365 -0.0832432359 -0.306503236 -0.0795367435 0.277975351 0.0103795044 0.197861522 -0.463964254 0.0460627973 -0.691076517 0.218603089 -0.447111547 -0.0333644077 -0.439966202 -0.283281863 -0.013531629 0.50945431 -0.705919266 0.25665611 0.029182218 -0.409537733 -0.116072506 -0.0328169651 -0.1983722 0.324900597 0.0853225738 -0.114950635 0.28221336 0.0611787699 -0.0344685763 0.357370555 -0.589705944 -0.24937807 -0.0080305 0.79175508 -0.167672127 0.0365265608 -0.436599731 0.153021529 -0.133471429 -0.629660368 -0.0871134251 -0.103498258 -0.149827451 0.378510296 0.868401885 -0.598987341 0.160983115 0.261486322 0.653617144 0.112870656 0.335867763 -0.0944061354 0.337530851 -0.296513 0.517484665 -0.43780762 0.0970332175 -0.37353915 0.337981075 -0.332032353 1.14180565 0.570612311 -0.359180361 0.439269572 0.180384859 -0.489291698 0.346548557 -0.454834551 -0.17369175 0.208824292 -0.366948724 -0.018982742 0.0682527125 0.0262328871 0.379669249 -0.179537177 -0.254400492 -0.57315886 -0.226627722 0.351775855 0.15934965 0.549802065 0.0397418 -0.379078835 0.496997416 -0.111561865 -0.191034168 -0.224825352 0.360356957 -0.0856648833 0.132795379 0.629708946]
Any help on what might be going wrong, or what else may I check to fix this?
Let me know if you need any other information, thanks in advance.
Just to ensure if network is being trained and sync between online and target network is happening, I have even tried taking these values to be very less at https://github.com/google-research/recsim/blob/master/recsim/agents/dopamine/dqn_agent.py#L186 min_replay_history=100, target_update_period=50
@cwhsu-google Any help would be appreciated.
Thank you very much for your help! I found all relevant info at https://arxiv.org/pdf/1905.12767.pdf
Hi, may I know if you finally get a converged reward by using full_slate_q_leanring? I'm currently working on this, but I cannot get a converged reward in the interest_evolution environment no matter how the large number of training steps I trained.
I am trying to get familiarized with recsim. I want to test the full_slate_q_agent upon the interest_evolution but the agent does not seem to learn something. random_agent performs same as full_slate_q_agent. The command i execute is the usage example from https://pypi.org/project/recsim/ with the only difference that num_iterations are set to 100 instead of 10.
python main.py --logtostderr --base_dir="/tmp/recsim/interest_evolution_full_slate_q" --agent_name=full_slate_q --environment_name=interest_evolution --episode_log_file='episode_logs.tfrecord' --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=100 --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5
Am i doing something wrong? Which set up should i use so as to explore the performance of full_slate_q_agent ???
I also attach the tensorboard figure with the training results:
Thanks a lot for you time and help and many congratulation for recsim. Seems to be a very potencial and promising lib :-)