As a voter,
I want to receive useful matching results based on questions where multiple categorical answers are provided,
so that I feel that the complexity of political issues is properly accounted for.
Categorical answers refer to non-ordinal aka. nominal answers, which cannot be plotted on a single dimension like Likert answers.
Supporting user story
As a candidate or voter,
I want to write my own answer to a multiple-choice question such that it can be used for matching,
so that I can express my opinion without being limited by the answering options.
Considerations
Combined or per-question embedding?
We can either measure the level of agreement with this method separately for each question or embed all of a candidate’s or voter’s answers in the same space and then calculate the overall distance.
A shortcoming of the per-question approach is that even though the current matching paradigm ultimately yields a scalar distance (rendered into a proximity percentage), it expects each entity to be projected into a single, multidimensional space which can also be projected onto lower-dimensional spaces for visualisation.
Distance calculation in the embedding space or a projection thereof?
Instead of calculating the distance of two positions directly in the out-of-the-box embedding space, is it possible to create a learnable projection (with the embedding vectors for the question and the answer as inputs) into a (lower-dimensional) "political" space in which the distances are measured? The reasoning is that we do not want superficial features of the answers, such as stylistic choices, affecting this distance, as these are supposedly represented in the components of the embedding vector.
Can we use as training data the pre-existing combinations open answers and their associated agreement values (i.e. unidimensional positions)?
Can we use as training data the pre-existing factor weightings for VAA question with regard to somewhat standardized political dimensions, such as economic left–right?
Subtasks
Initial planning
Which technologies to use?
Which of the tech is shared between the other ideas?
Which data sources are needed?
Is the AI model interacted with in real-time or do we use canned results?
Make a sketch of the ideal interaction? E.g. a script of the chat or a rough draft of the visualisation
Primary user story
As a voter, I want to receive useful matching results based on questions where multiple categorical answers are provided, so that I feel that the complexity of political issues is properly accounted for.
Supporting user story
As a candidate or voter, I want to write my own answer to a multiple-choice question such that it can be used for matching, so that I can express my opinion without being limited by the answering options.
Considerations
Combined or per-question embedding?
We can either measure the level of agreement with this method separately for each question or embed all of a candidate’s or voter’s answers in the same space and then calculate the overall distance.
A shortcoming of the per-question approach is that even though the current matching paradigm ultimately yields a scalar distance (rendered into a proximity percentage), it expects each entity to be projected into a single, multidimensional space which can also be projected onto lower-dimensional spaces for visualisation.
Distance calculation in the embedding space or a projection thereof?
Instead of calculating the distance of two positions directly in the out-of-the-box embedding space, is it possible to create a learnable projection (with the embedding vectors for the question and the answer as inputs) into a (lower-dimensional) "political" space in which the distances are measured? The reasoning is that we do not want superficial features of the answers, such as stylistic choices, affecting this distance, as these are supposedly represented in the components of the embedding vector.
Subtasks