Make it possible to mimic how surveys are conducted with humans

the goal:

mimic how surveys are conducted with humans: We should ask all questions in survey-style session to one config of a bot, where a bot-config gets to answer all the question in one sequence.

In this PR, I made following changes:

Prompt Variation

added Question template field, which will be used to format one question, and the same format will be used in chat history and prompt formatting.
added Question prefix and AI prefix, which will be used in formatting prompts with chat history. (See langchain doc)

Session Result

Added Session Result df in AiEvalData (and the AI eval spreadsheet) which holds raw results from all sessions. Some notable columns:

Session ID: Just a random unique string, I use uuid.uuid4() to create it. In each Session, we run one survey with one model config and one prompt variation
Survey ID: A hash calculated from a list of question_ids, where the question_ids comes from an ordered list of questions (a survey).

(This Dataframe will be large if we have many surveys and model configs, might be good to put it to another place. I suggest that when it become too large, we can export it to a google drive folder and clean the content in Sessions sheet. And in the google drive folder, we name the files session.log.1.csv, session.log.2.csv etc. Just like the log files management in Linux.)

Model Configurations

Added memory and memory_size to enable model memory. memory_size controls how many questions the model will remember

Helpers and Notebook

updated to show how to run one survey and run multiple surveys with different configs.
~I haven't update the calculation of Evaluation Result yet~ added some simple metrics

Gapminder / gapminder-ai