OpenVAA / voting-advice-application

An open-source platform for creating Voting Advice Applications (VAAs)
https://openvaa.org/en
GNU General Public License v3.0
10 stars 0 forks source link

Alternative matching paradigm using embedding vectors for categorical multiple-choice questions #625

Open kaljarv opened 3 weeks ago

kaljarv commented 3 weeks ago

Primary user story

As a voter, I want to receive useful matching results based on questions where multiple categorical answers are provided, so that I feel that the complexity of political issues is properly accounted for.

Categorical answers refer to non-ordinal aka. nominal answers, which cannot be plotted on a single dimension like Likert answers.

Supporting user story

As a candidate or voter, I want to write my own answer to a multiple-choice question such that it can be used for matching, so that I can express my opinion without being limited by the answering options.

Considerations

Combined or per-question embedding?

We can either measure the level of agreement with this method separately for each question or embed all of a candidate’s or voter’s answers in the same space and then calculate the overall distance.

A shortcoming of the per-question approach is that even though the current matching paradigm ultimately yields a scalar distance (rendered into a proximity percentage), it expects each entity to be projected into a single, multidimensional space which can also be projected onto lower-dimensional spaces for visualisation.

Distance calculation in the embedding space or a projection thereof?

Instead of calculating the distance of two positions directly in the out-of-the-box embedding space, is it possible to create a learnable projection (with the embedding vectors for the question and the answer as inputs) into a (lower-dimensional) "political" space in which the distances are measured? The reasoning is that we do not want superficial features of the answers, such as stylistic choices, affecting this distance, as these are supposedly represented in the components of the embedding vector.

Subtasks

  1. Initial planning
    • Which technologies to use?
    • Which of the tech is shared between the other ideas?
    • Which data sources are needed?
    • Is the AI model interacted with in real-time or do we use canned results?
    • Make a sketch of the ideal interaction? E.g. a script of the chat or a rough draft of the visualisation
  2. Prototyping
    • Make a UI sketch in Figma
    • Test the suggested workflows using real data and real models
  3. Development
    • Steps TBA
  4. Admin UI development
    • Optional
kaljarv commented 3 weeks ago

For information about the standard paradigm used in matching, see the vaa-matching docs. (NB. The module will be migrated soon to /vaa-matching)

tamma1 commented 3 weeks ago

https://www.notion.so/Alternative-voting-paradigm-using-embedding-vectors-for-non-agreement-based-multiple-choice-question-12f40660a20a8012af39ee824bf8f22a

MemeSlayer27 commented 1 day ago

github repo docs