Closed rmitsch closed 3 years ago
Thank you for using RecSim and sorry for the bug in the chocolate/kale example. The following is the correct version of generate_response(). As you can see we mess up kale_mean and choc_mean here but the code in https://github.com/google-research/recsim/blob/master/recsim/environments/long_term_satisfaction.py is correct.
def generate_response(self, doc, response): response.clicked = True
engagement_loc = (doc.kaleness * self._user_state.kale_mean
(1 - doc.kaleness) self._user_state.choc_mean) engagement_loc = self._user_state.satisfaction engagement_scale = (doc.kaleness * self._user_state.kale_stddev
We will update the repository shortly. Let me know if you still have questions.
No, that's it. Thanks for your response!
When running the chocolate/kale example, I select a slate following a deterministic kaleness-first selection policy - i.e. I order document observations descendingly by their kaleness and then define the action as the first
slate_size
items:document_observations
isobservation["doc"]
, so e.g.(array([0.57019677]), array([0.43860151]), ..., array([0.46631077])
.For comparison I run the environment with the reverse policy, which is to select the action by picking the documents with the lowest kaleness:
If I compare both policies after running them for a couple hundred steps, the kaleness-first policy yields higher engagement and lower user satisfaction than the chocolateness-first policy. I would expect exactly the opposite, since kaleness is supposed to boost user satisfaction at the cost of lower engagement.
Why does selecting items with the highest kaleness yield a lower user satisfaction and higher engagement than selecting items with the lowest kaleness?