rmitsch commented 3 years ago

When running the chocolate/kale example, I select a slate following a deterministic kaleness-first selection policy - i.e. I order document observations descendingly by their kaleness and then define the action as the first slate_size items:

action = tuple(np.argsort([do[0] for do in document_observations]))[:slate_size]

document_observations is observation["doc"], so e.g. (array([0.57019677]), array([0.43860151]), ..., array([0.46631077]).

For comparison I run the environment with the reverse policy, which is to select the action by picking the documents with the lowest kaleness:

action = tuple(np.argsort([do[0] for do in document_observations]))[::-1][:slate_size]

If I compare both policies after running them for a couple hundred steps, the kaleness-first policy yields higher engagement and lower user satisfaction than the chocolateness-first policy. I would expect exactly the opposite, since kaleness is supposed to boost user satisfaction at the cost of lower engagement.

Why does selecting items with the highest kaleness yield a lower user satisfaction and higher engagement than selecting items with the lowest kaleness?

cwhsu-google commented 3 years ago

Thank you for using RecSim and sorry for the bug in the chocolate/kale example. The following is the correct version of generate_response(). As you can see we mess up kale_mean and choc_mean here but the code in https://github.com/google-research/recsim/blob/master/recsim/environments/long_term_satisfaction.py is correct.

def generate_response(self, doc, response): response.clicked = True

linear interpolation between choc and kale.

engagement_loc = (doc.kaleness * self._user_state.kale_mean

(1 - doc.kaleness) self._user_state.choc_mean) engagement_loc = self._user_state.satisfaction engagement_scale = (doc.kaleness * self._user_state.kale_stddev
- ((1 - doc.kaleness)
  - self._user_state.choc_stddev)) log_engagement = np.random.normal(loc=engagement_loc, scale=engagement_scale) response.engagement = np.exp(log_engagement)
We will update the repository shortly. Let me know if you still have questions.

rmitsch commented 3 years ago

No, that's it. Thanks for your response!

google-research / recsim

Clarification of satisfaction/engagement tradeoff in choc-kale example #18

linear interpolation between choc and kale.