Build a feedback-collection tool

Collect a bunch of contexts to generate questions for -> list of texts
In batch, ask our LMs to generate questions for each context; aggregate the results. -> list of, for each text, list of questions and metadata (what system generated it). Ask each LM for several questions, log all of them.
Streamlit app: pick a random context from the list, pick some subset of the questions (random?) ask user which one is "best" (most appropriate, most helpful, ...?) -> user's choice of which question. Maybe have them rank the questions? (drag them into best-to-worst?) (or: pick the best)
- log this in a database:

context, question, system_that_generated, rater_id, rank

AIToolsLab / questions