compdemocracy / polis

:milky_way: Open Source AI for large scale open ended feedback
https://pol.is
GNU Affero General Public License v3.0
784 stars 186 forks source link

Take non-returning (stale) participants into account #211

Open colinmegill opened 5 years ago

colinmegill commented 5 years ago

If a participant arrives early, votes on all statements (say 10) and never comes back, there could be 100's of statements and 100% of the representative ones for clustering.

We've talked about 'grading' people's participation based on how well we think we understand their position, and could use that metric to remove participants from clusters if their participation stats are poor.

Example case: http://www.scoop.co.nz/stories/HL1708/S00025/hivemind-universal-basic-income-are-we-up-for-it.htm

Thanks to Jon Skjerning-Rasmussen of Alternativet in Denmark!

patcon commented 4 years ago

We've talked about 'grading' people's participation based on how well we think we understand their position

Is this ticket its own standalone discussion/todo, or more closely related to convo developing here?https://github.com/pol-is/polis/issues/210#issuecomment-624900680 (either is fine, just can't tell)

colinmegill commented 4 years ago

Both

On Thu, May 7, 2020, 12:18 AM Patrick Connolly notifications@github.com wrote:

We've talked about 'grading' people's participation based on how well we think we understand their position

Is this ticket its own standalone discussion/todo, or more closely related to convo developing here? pol-is/polisServer#210 (comment) https://github.com/pol-is/polis-issues/issues/116#issuecomment-624900680

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pol-is/polis-issues/issues/117#issuecomment-625019972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANQGGKGKBGW6OZZ7N52QMDRQIZB5ANCNFSM4GRNSOIA .

metasoarous commented 3 years ago

I think the right way to think about this is as percentage of variance explained by the comments a participant has voted on. This turns out to be slightly tricky, and related to the ideas of smarter sparsity aware projection that we've talked about. It's possible that some of this may be thought of in terms of @ThenWho's work on vote predication as well (see: https://github.com/ThenWho/pol-is-link-prediction/blob/main/Julia_LinkPrediction_TensorFactorization_Vechgrad.ipynb)

ThenWho commented 3 years ago

In tensor decomposition, more than the vote predictions themselves, the resulting loading matrices A, B and C might be useful. I.e. A is an embedding of the participants, having taken into account their contributions (through the weight matrix). 'Funny' behavior will show there.

But I'll be weary of discarding participants with low stats. Better to engage them more e.g. with statement suggestions taken from link predictions. I.e. prompt them to take a stance on (highly probable) potential links and pique their interest in participating more. I understand that's difficult if participants do not provide their email at the end of their first visit, but that's a signal too (prob of badly crafted seed staements).

Currently I'm looking into dynamic tensors - how not to recalculate the whole tensor when a new participant enters, or the same participant provides one more vote. If this proves computationally reasonable, it can be used to suggest interesting statements even from the first visit.

Related parenthesis: the beauty of tensors is that they can incorporate any type of information as separate slices. Now there are only 3 slices, pos/neg/pass, or even one if 1/-1/0 is used on the same slice, but more can be encoded. For example, priority as defined in https://github.com/pol-is/polis/issues/217 can be a slice, or number of returning visits. (Not sure what such a returning visits number will show when predicted, but I'm willing to bet it's going to be very interesting. Especially is recalculated after each vote of the first visit.)

metasoarous commented 3 years ago

@ThenWho Thanks for your thoughts on this!

For some additional context, we already do something analogous to "throwing participants out" for whom we don't have enough information. Of course, we don't entirely throw them out of the conversation, but we don't include them in the clustering (or PCA, IIRC) to avoid the sparseness messing with the quality of the results.

You're 100% right that the correct approach to this is to try to get such participants to come back and contribute more information when possible. This issue certainly takes precedence for us, and we do have some thoughts about the right way to approach this (there were issues with our initial/naive notification system sending to much and effectively spamming users).

However, we're likely to still want to do some QA for deciding which participants are in the conversation. So this issue is really about doing something smarter than the simple vote count threshold we use currently.

Again, thanks for your thoughts on this!

ThenWho commented 3 years ago

Thanks for the context @metasoarous , now I get the problem a bit better! I'll need to understand more how sparseness can be a problem in that sense, as I don't have it 100%. The hit-n-run participants i do have :) , although no easy solutions pop up at the moment. I'll keep it in mind though, and something will come up sooner or later.