Question on the experiment result. - Githubissues

AaronHeee / Bundle-MCR

Introducing Bundle Recommendation in Conversational Recommendation Scenarios on RecSys 2022

22 stars 3 forks source link

Question on the experiment result. #3

Closed Snnzhao closed 1 year ago

Snnzhao commented 1 year ago

When training online, there is a big gap between acc for "collecting rollout" and evaluation. This is puzzling, why this happens?

AaronHeee commented 1 year ago

Hi @Snnzhao, sorry I saw this question just now. Yes there exist a gap for the same metrics that is because:

For collecting rollout, policy predictions are not deterministic (for exploration), please check this. Considering that we have multiple policy networks, sampling from distributions may change the conversation trajectories a lot.
For evaluation, policy predictions are deterministic (for exploitation), please check this. In evaluation phase, we are trying to use the best policies we have so far.

Hope this can address your concern, and please let me know if you have more questions. Thanks!