Closed JamesCao2048 closed 8 months ago
Hi @JamesCao2048,
thanks for your comments, you observed everything very well! Let me comment on your questions:
I hope my answers were helpful! Let me know if you have further questions.
Best Chris
Hi @ChrisGeishauser, Thank you so much for your very very helpful and fast reply! This helps me a lot!!!
I have a follow-up question for the point 3. As you said,
If you now assume that humans have an almost perfect NLU performance (at least much better than the NLU models), the setting +SystemNLU makes a lot of sense because adding a UserNLU in addition would just introduce additional noise.
So if we want to remove the noises introduced by UserNLU, does +SystemNLU is a better evaluation setting than full setting? Because we do not want the errors caused by UserNLU make the dialogue fail.
But for UserRulePolicy, I think the UserNLU is necessary because it seems could not receive system dialogue act direcly?
Hi @JamesCao2048,
awesome, happy to help!
So if we want to remove the noises introduced by UserNLU, does +SystemNLU is a better evaluation setting than full setting? Because we do not want the errors caused by UserNLU make the dialogue fail.
Yes, I think so! But of course this only holds if the user does not take system utterance as input.
But for UserRulePolicy, I think the UserNLU is necessary because it seems could not receive system dialogue act direcly?
The UserRulePolicy should obtain system dialogue acts directly!
Hi. @ChrisGeishauser Your reply perfectly answered my questions. I will try UserPolicy with semantic act later. Thanks a lot!!
Best regards, James Cao
Hi, I have found that RL policy could be trained and evaluated with pipelines with different components occupied. Here are several configurations I found,
My questions are:
If I have any mistakes, please correct me. Looking forward to your reply, thanks!