RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.78k stars 4.62k forks source link

Define the role of the person who is managing the user simulator, share it with the evaluation team and decide on the SPOC for it. #9570

Closed dakshvar22 closed 3 years ago

dakshvar22 commented 3 years ago

Part of https://github.com/RasaHQ/rasa/issues/9561

A round of experiment (focussed towards the role of a prompt engineer (PE) ) for https://github.com/RasaHQ/rasa/issues/9561 can be summarized as:

  1. The team picks a conversation phenomenon to be studied as part of this round, for e.g. - negation, multi-intents, etc.
  2. A PE designs the prompts for the user simulator such that the conversation phenomenon is well reflected in the conversations, the simulator has with the bot. The prompts should cover all the happy paths defined in the task schema of the bot. (The task schema will be shared with them).
  3. The rest of the team improves the bot with E2E or non-E2E paradigms until the bot cannot be improved further.
  4. The PE meets with the rest of the team to decide whether the prompts should be tweaked to make the conversations more complex.
  5. If answer to (4) is yes, we repeat 2-4.
  6. Once the answer to (4) is no, the PE acts as a real user and talks to both the versions of the bot.

Hence, the role of a PE is to:

  1. Design the prompts in each round of the experiment. This includes: a. Prompts covering all the happy paths b. Making sure the user simulator sticks to the required conversation phenomenon as much as possible.
  2. Decide whether the user simulator can generate more complex conversations for further rounds of analysis while sticking to the same conversation phenomenon (step 4 above).
  3. Talk to the two versions of the bot themselves to cover any natural language variation that the user simulator couldn't have generated.
  4. Document their learnings / experience of prompt engineering.
amn41 commented 3 years ago

thanks Daksh!

The PE meets with the rest of the team to decide whether the prompts should be tweaked to make the conversations more complex.

There seems to be a baked in assumption that the user simulator has to be defined by a single prompt. Surely we can use a series of prompts? I think otherwise we would quickly saturate what can be achieved through prompt engineering

dakshvar22 commented 3 years ago

Surely we can use a series of prompts?

Yes, definitely. A round of simulated conversations can be generated by multiple prompts.

dakshvar22 commented 3 years ago

@koernerfelicia volunteered to pick up the role of prompt engineer 🎉