Open saxena-priyansh opened 3 years ago
Hi, thanks for using RecSim. I can help provide some pointers to help with further debugging:
Why q values over different epochs are turning out to be same? Not clear which q values is mentioned here? Are you manually printing q values? Or is this referring to one of the Tensorboard charts attached in the question?
Which in turn is returning same slates for all the checkpoints Hmm, this is concerning indeed. How many examples/users are you evaluating per checkpoint? Can you double check (e.g., by logging) that the inference code is indeed using different checkpoints? Can you try two things: (1) try evaluating more (say 100) users per checkpoint and compare the slates (you can sample users from a normal distribution). (2) evaluate checkpoints further spaced apart (e.g., 50k, 100k, 150k, 200k etc).
This raises question, whether model is training or not I do see that the average episodic reward is improving with number of steps, so my guess is it must be training. Have you tried hparam tuning or are you using same network parameters from the default implementation?
Also we see watch time for each video is 4 min, since q values reflect cumulative reward over state, action pair, how come their scale is 10exp-2 Again not sure which q values are referred here. The average episodic length is ~60 and average episodic reward is ~160 which seems in the right range as there will be some small negative rewards for some actions as well.
Hope this helps!
Thanks @vihanjain for your response. To elaborate on this:
q_values = tf.Print(q_values, [q_values], 'q_values', summarize=1000)
.What we have tried so far to debug:
LOGS...
Checkpoint that would be read: 70
Not in early execution... model_weights/results12/train/checkpoints/tf_ckpt-70
Model loading time taken: 20.099586963653564
Going to predict...
q_values[[-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]]
[cp 1][21 1 22][0.016246004 0.0518380962 0.0206836071 0.0115500968 0.0287562609 0.0341853239 0.0287562609 0.0756491944 0.0241801739 0.0242459327 0.0300822686 0.0402437486 0.0284955166 0.0307806712 0.0206836071 0.0284955166 0.0120282751 0.0394958928 0.0299003273 0.0284955166 0.0115500968 0.0756491944 0.0402437486 0.0341853239 0.0341853239 0.0299003273 0.0284955166 0.0120282751 0.0394958928 0.0115500968][-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]
Step time taken: 0.3186337947845459
Time taken: 320.206880569458 ms
['51', '31', '52']
Prediction Time taken 0.3202650547027588 seconds
Checkpoint that would be read: 80
Not in early execution... model_weights/results12/train/checkpoints/tf_ckpt-80
Model loading time taken: 20.884052991867065
Going to predict...
q_values[[-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]]
[cp 1][21 1 22][0.016246004 0.0518380962 0.0206836071 0.0115500968 0.0287562609 0.0341853239 0.0287562609 0.0756491944 0.0241801739 0.0242459327 0.0300822686 0.0402437486 0.0284955166 0.0307806712 0.0206836071 0.0284955166 0.0120282751 0.0394958928 0.0299003273 0.0284955166 0.0115500968 0.0756491944 0.0402437486 0.0341853239 0.0341853239 0.0299003273 0.0284955166 0.0120282751 0.0394958928 0.0115500968][-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]
Step time taken: 0.29644083976745605
Time taken: 297.63102531433105 ms
['51', '31', '52']
Prediction Time taken 0.2976522445678711 seconds
Checkpoint that would be read: 90
Not in early execution... model_weights/results12/train/checkpoints/tf_ckpt-90
Model loading time taken: 19.210211753845215
Going to predict...
q_values[[-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]]
[cp 1][21 1 22][0.016246004 0.0518380962 0.0206836071 0.0115500968 0.0287562609 0.0341853239 0.0287562609 0.0756491944 0.0241801739 0.0242459327 0.0300822686 0.0402437486 0.0284955166 0.0307806712 0.0206836071 0.0284955166 0.0120282751 0.0394958928 0.0299003273 0.0284955166 0.0115500968 0.0756491944 0.0402437486 0.0341853239 0.0341853239 0.0299003273 0.0284955166 0.0120282751 0.0394958928 0.0115500968][-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]
Step time taken: 0.3162529468536377
Time taken: 317.17681884765625 ms
['51', '31', '52']
Prediction Time taken 0.3171958923339844 seconds
Checkpoint that would be read: 100
Not in early execution... model_weights/results12/train/checkpoints/tf_ckpt-100
Model loading time taken: 22.500069856643677
Going to predict...
q_values[[-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]]
[cp 1][21 1 22][0.016246004 0.0518380962 0.0206836071 0.0115500968 0.0287562609 0.0341853239 0.0287562609 0.0756491944 0.0241801739 0.0242459327 0.0300822686 0.0402437486 0.0284955166 0.0307806712 0.0206836071 0.0284955166 0.0120282751 0.0394958928 0.0299003273 0.0284955166 0.0115500968 0.0756491944 0.0402437486 0.0341853239 0.0341853239 0.0299003273 0.0284955166 0.0120282751 0.0394958928 0.0115500968][-0.13537176 0.234581396 0.259239793 0.0990562439 0.00532283308 -0.0287395343 0.107342854 -0.044202745 -0.00707248785 0.0450880975 -0.207072049 0.027151769 0.134151459 0.0761770904 -0.413350075 0.066539 0.31752333 -0.0904344171 -0.170975745 0.195714355 0.0547820404 0.166659713 0.207024947 -0.199312985 -0.418331027 -0.00844216906 0.0422177538 0.0487211384 -0.0899787843 0.0510536507]
Step time taken: 0.29529881477355957
Time taken: 296.33116722106934 ms
['51', '31', '52']
Prediction Time taken 0.29636096954345703 seconds
Below ss shows weights in different iterations: networks/network weights are changing, but networks/network_1 & networks/network_2 are same
@vihanjain any help would be appreciated.
@vihanjain Let us know your thoughts on this, also if some more details are required?
Thanks for the great work @cwhsu-google. Our team is trying to use RecSim for slate recommendation.
After training the agent (slate_decomp_q_agent) for 300k steps. I tried loading different checkpoints and to generate slates for the same user (to understand convergence of q values) but the slates returned after every iteration are the same.
Here is my script that I used for prediction:
inference.py
prediction.py
These graphs were generated on tensorboard:
Most importantly I am looking answers for the following
Any help would be appreciated