Ground Truth (Metrics discussion)

AngelaKTE commented 3 months ago

Description

Goal:
Why here?: * 1.

In Social Choice/Epistemic Democracy there is a concept called Objective Truth, that refers to the “right” allocation that should ideally be achieved.

In OP Retro Funding we assume that there is an objective truth, the positive impact made to the Optimism Collective.

We also know, that there’s an optimal distribution of the funds: impact = profit, ”positive impact to the Collective should be rewarded with profit to the individual”. https://www.optimism.io/vision

The voting conducted in Retro Funding aims to find this objective truth using crowd wisdom, since we don’t have any better instrument to measure impact and define profit. (Otherwise, we could simply write an algorithm that computes funding for projects based on application data). * 2.

In the voting, we assume that every voter provides a noisy estimate of this objective truth. Each voter’s preference can be seen as an approximation of the truth, influenced by own information, biases, and constraints. In simulations, we can assign a voter’s deviation of the true optimal allocation, and compare the aggregated output the different voting rules give. The goal is to find a voting rule that is as close as possible to (any) true optimal allocation.

Note: we don’t define the objective truth itself here, we measure how well the voting rule outputs the objective truth (Impact = Profit). To optimize Impact = Profit for Retro Funding, more areas have to be addressed, such as

the information projects provide (e.g. KPIs, Open Source Observer data)
the incentives for voters to vote according to impact = profit
and more. That’s why we marked this objective in bright green. In our deliverable, we’ll evaluate the voting rules as outlined above, and we make proposals how to improve the voting design & evaluation methods, so that “Objective 9 Impact = Profit” can be evaluated with rigor.
We compare:
- compare across all voting rules to see what voting rule produces the minimal deviation between votes and "objective truth"
- we take the exact same set of (randomly generated) voting matrices
- measure the distance to any pre-defined "objective truth"

How we model it:

To define the ground truth, we generate a random (normalized) distribution of funds (ground_truth array)
We calculate the Hamming distance between the ground truth, (hamming distance is the number of all votes that differ from ground truth!) and all voters' votes, focusing on the top top_k projects (to optimize model performance?)
We simulate x rounds with random votes (columns for each voting rule's Hamming distance across rounds), we create an output according to the voting rule
we measure the distance
Note that the Hamming distance finds how many votes differ, NOT how much they differ!

Open questions:

it the number of differences really the best way to measure it? Or rather the size of difference (L1 distance) per project/overall.
I'd suggest that we recommend building up dedicated monitoring and optimization around "impact = profit" that combines multiple measures:
- voting rule supporting objective truth
- monitoring diversity of votes (goal: Hamming distance = 0)
- align incentives voters & OP collective (rewards for voters)
- voting design supports objective truth (e.g. sectret voting, reward schelling point style, PLUS counterbalance groupthink, self-fullfilling predictions (more funding allows projects to be more successful)
- is l1 best in this case? Or ordering/ranking? I feel today, the voters can make sense of ordering, but can take less clear decisions when it comes to the exact amount of funding? What to recommend to OP? This would also make sense since the voting has to be normalized, all funding rounds have a total amount, and thus, the exact funding is a function of this predetermined total funding.

Notes for simulation updates:

please use l1 distance to measure! (output)
discuss with Muhammad!

Chart:

I'd like to see the aggregated deviation per voting rule: If I go over all simulated voting rounds, and all randomly created ground truths, is there any pattern we can see?
does the voting rule affects how many votes differ from ground truth?
(note Angela for myself): imagine a wide range of different votes, we measure how well the voting rule puts the final result in the center of all votes

linear[bot] commented 3 months ago

GOV-27 Ground Truth (Metrics discussion)

### Description * **Goal:** * **Why here?:** * 1. In Social Choice/Epistemic Democracy there is a concept called Objective Truth, that refers to the “right” allocation that should ideally be achieved. In OP Retro Funding we assume that there is an objective truth, the *positive impact made to the Optimism Collective*. We also know, that there’s an optimal distribution of the funds: impact = profit, ”positive impact to the Collective should be rewarded with profit to the individual”. https://www.optimism.io/vision The voting conducted in Retro Funding aims to find this objective truth using crowd wisdom, since we don’t have any better instrument to measure impact and define profit. (Otherwise, we could simply write an algorithm that computes funding for projects based on application data). * 2. In the voting, we assume that every voter provides a noisy estimate of this objective truth. Each voter’s preference can be seen as an approximation of the truth, influenced by own information, biases, and constraints. In simulations, we can assign a voter’s deviation of the true optimal allocation, and compare the aggregated output the different voting rules give. The goal is to find a voting rule that is as close as possible to (any) true optimal allocation. * 3. Note: we don’t define the objective truth itself here, we measure how well the voting rule outputs the objective truth (Impact = Profit). To optimize Impact = Profit for Retro Funding, more areas have to be addressed, such as - the information projects provide (e.g. KPIs, Open Source Observer data) - the incentives for voters to vote according to impact = profit - and more. That’s why we marked this objective in bright green. In our deliverable, we’ll evaluate the voting rules as outlined above, and we make proposals how to improve the voting design & evaluation methods, so that “Objective 9 Impact = Profit” can be evaluated with rigor. * **We compare:** * compare across all voting rules to see what voting rule produces the minimal deviation between votes and "objective truth" * we take the exact same set of (randomly generated) voting matrices * measure the distance to any pre-defined "objective truth" **How we model it:** * To define the ground truth, we generate a random (normalized) distribution of funds (`ground_truth` array) * We calculate the Hamming distance between the ground truth, (hamming distance is the number of all votes that differ from ground truth!) and all voters' votes, focusing on the top `top_k` projects (to optimize model performance?) * We simulate x rounds with random votes (columns for each voting rule's Hamming distance across rounds) * Note that the Hamming distance finds ***how many votes differ***, NOT how much they differ! **Chart:** * I'd like to see the aggregated deviation per voting rule: If I go over all simulated voting rounds, and all randomly created ground truths, is there any pattern we can see? * does the voting rule affects how many votes differ from ground truth? **Open questions:** * @nimrodtalmon77 @briman as far as I can see in [https://github.com/GovXS/OP-Evaluating-Voting-Design-Tradeoffs-for-Retro-Funding-RESEARCH-/blob/main/evaluations/ground_truth_alignment.ipynb](https://github.com/GovXS/OP-Evaluating-Voting-Design-Tradeoffs-for-Retro-Funding-RESEARCH-/blob/main/evaluations/ground_truth_alignment.ipynb) there's no significant impact of the voting rule. Did you expect this? Any comments? * For the report, let's discuss more approaches to address the "Impact = Profit" question (see above)

AngelaKTE commented 3 months ago

Open questions:

@nimrodgithub134 @EyalBriman
a) there's no significant impact of the voting rule. Did you expect this? Any comments? (see in https://github.com/GovXS/OP-Evaluating-Voting-Design-Tradeoffs-for-Retro-Funding-RESEARCH-/blob/main/evaluations/ground_truth_alignment.ipynb)

b) For the report, let's discuss more approaches to address the "Impact = Profit" question (see "Why here?")

nimrodgithub134 commented 3 months ago

Hmm.. No, I didn't expect no significant impact of the voting rule. One way to make the impact shine more is to consider perhaps other settings? Otherwise we can say that indeed for the settings we check this doesn't seem to make a lot of difference.

GovXS / Evaluating-Voting-Design-Tradeoffs-for-Retro-Funding

Ground Truth (Metrics discussion) #19

Description