Option to downselect statements shown to a user

compdemocracy / polis

:milky_way: Open Source AI for large scale open ended feedback

https://pol.is

GNU Affero General Public License v3.0

784 stars 186 forks source link

Option to downselect statements shown to a user #210

Open crkrenn opened 6 years ago

crkrenn commented 6 years ago

Three related questions follow:

Question 1: Have there been discussion or plans to downselect the statements or comments in a conversation shown to a user? An extreme example is https://pol.is/football, where there are currently 45 statements shown to a new user. I did not have the patience to go through all of these, and I do not want to ask my users to potentially face the same.

Some simple changes (that can be changed by coding or by the moderator):

A user entering a new conversation is asked to vote on a maximum of ~5 statements.
The user is asked if they would like to answer more 1 or two times more. (I imagine that I am not the only one who is more likely to be a return user if a site is more respectful of my time and attention.)

Question 2: Is this reasonable? If so, can I help to implement this?

I haven't figured out your implementation of priority and importance, but I am guessing that it satisfies most, if not all of the following criteria:

The statements shown are selected to cover "phase space" as well as possible. (For the football example, you might chose some questions the majority answered similarly, and some that groups A and B answered uniquely.)
Statements that a majority "passed" should be asked less frequently.
Every statement should have some chance of being asked, unless the moderator has removed it.
"Good" questions, by some metrics such as their power in predicting class, should be asked more frequently.

Question 3: Do the polis algorithms already fulfill goals 1-4?

Gratefully and respectfully,

-Chris

colinmegill commented 6 years ago

Chris,

Thanks so much for this. This is an excellent question, direction, and summary of an important problem. Yes, the algo handles 1-4.

A few things to push back on as we divine a solution to improve the system, and then a concrete proposal to spur more discussion.

I can say with a degree of certainty that we don't ever want to prescribe how many comments people need to vote on as a magic number - we will just never know enough about the conversation and it's variable, and it changes as the total vote count changes. 100 will always be enough probably, regardless of conversation size, 50 probably will, 25 probably won't be for large conversations, 10 might for very small conversations. We routinely see people voting on hundreds of statements - and not a few people - hundreds of people in conversations vote on hundreds of statements, so I'm loathe to interrupt people's flow as that produces density in the matrix that 'lifts all boats' - ie., supervoters help understand people who have only voted on a few statements, though it's also true they have disproportionate weight. @metasoarous can better speak to this as it's probably a research topic in and of itself. Perhaps it suggests two metrics for two goals - how well we understand you and how much you've helped improve the conversation health/contributed to the overall good.

We have to date let the user decide when they're done, and recently added the number of comments remaining and delayed doing even that much. Perhaps that was not enough.

I'd like to propose some kind of feedback mechanism not about the total number, but about 'when I'm done'. There could be a D- through A metric of 'participant vote record health' which takes into account:

how much the participant voted in raw terms (shouldn't be 500 no matter what or we're doing something wrong)
whether the participant has seen multiple statements from all of the groups
whether the participant has seen multiple consensus statements
whether there are masses of comments that are unvoted on (this gets at overall conversation health, which we also want to have in the report and could be related - ie., local voting health is related to how much you have contributed to the global understanding of the opinion space, and sometimes there won't be as much need)
other factors to be discussed?

Or, as a fillbar rather than as a grade (less judgy):

An added benefit of this metric is that it could be aggregated - conversation health could be expressed in terms of a distribution of participant voting health, as well as which comments have votes and do not have votes (sparsity), both interesting metrics to add to the report.

Open to ideas. Let's do design on this issue, and talk implementation on another one once we feel we know what we want.

Thanks again!

metasoarous commented 6 years ago

Some great ideas here! Thanks for raising the issue @crkrenn.

For starters, we cover all of the points you listed (prioritizing statements which help us place you and consensus statements, while deprioritizing statements with lots of passes). Additionally, we have a factor that prioritizes sending out new statements which don't have a lot of comments, so they have a chance to bubble up and get voted on.

I'll also echo @colinmegill's comments about limiting user engagement. Supervoters do "lift all boats" in a sense, and we don't want to discourage them. It's possible that if people had a health indicator it would spur low voting users to actually vote more, but I'd be loath to do this at the expense of loosing the feedback from the supervoters.

Maybe what we need is a cheerleader that gives you feedback about your contribution, and encourages you to keep going. Perhaps something like this?

Seriously though, I love the idea of health metrics for voters, comments and conversations, and I have some ideas about how I'd go about this on the math side. I think the trickiest side will be getting the interface right, and ideally, we would do this as part of an A/B testing experiment to see what effect it has on participation so we know we're not shooting ourselves in the foot.

One bit of clarification, regarding @colinmegill's comment that "supervoters ... have disproportionate weight". That's not quite true; we don't weight supervoters any differently than low voters, and in fact, we take steps to even out the playing field by adjusting our projections of low-voters to equalize things.

patcon commented 6 years ago

lol Chris. And I had the same thought about A/B testing within the same pol.is conversations being helpful in convos like this. (enjoying this thread btw)

crkrenn commented 6 years ago

I agree with all that you have said...

I have continued to think about both respecting the users time as well as rewarding superusers. Here's a proposal that I hope address many of your questions and concerns for comment and iteration:

To include a horizontal continuous spectral bar from blue to green (see matplotlib BuGn below) with the following attributes:

Progress on the number of questions is shown from left to right with saturation indicating progress (e.g. pale blue/green turns into more saturated blue/green as more questions are answered.
Users are given a visual check/check+/check++/check+++ as the following criteria are met:
1. A threshold minimum number of useful questions is passed (Min[10,total_statements])
2. The algo has ~1 sigma confidence in prediction of that user's answers to the remaining questions
3. The algo has ~2 sigma confidence in prediction of that user's answers to the remaining questions
4. All the questions are answered.
5. (As the total number of statements decrease, these 4 bins can be collapsed into fewer ones)
The confidence in prediction should encompass "whether the participant has seen multiple statements from all of the groups" and "whether the participant has seen multiple consensus statements". And, the current algo should force the user to vote on unanswered questions.
I think you should probably decouple the feedback between how well a user has contributed from overall conversation health. For small numbers of participants, one user can improve the conversation health significantly, but for large numbers, would it be frustrating to see no change?

A potentially tricky question is the confidence quantification, and there will be a tradeoff on accuracy and computational cost. One possibility is to train a classifier on the existing data and to test against the previous data prediction using an ordered set of n questions selected by the current algo. The basic idea is to quantify the reduction in marginal utility of each answer.

Quantifying conversation health could be related. It should definitely include a measure of how many people have voted on each statement. Ideally it would include demographic completeness, but I imagine that is down the road. One metric could be assessing how much statistics change when single users are left out of the analysis, or how much they are changing over time. And, of course standard polling metrics apply.

And, again, as far as respecting a user's time, I think it would be better to ask a user a smaller number of questions on a single day, and then to recontact to give them an update on the status of the conversation and to request their help in responding to some statements that they had not seen before and which would help the health of the conversation.

PS. My speed in responding does not correlate with my enthusiasm. PPS. LOL on Clippy. I haven't thought of he/she/them/it in a long time.

https://matplotlib.org/_images/colormaps_reference_01.png

crkrenn commented 6 years ago

A more modest proposal:

I do like the fillbar. And, it could be used for both user progress and conversation health. And I like the idea of parallel A/B testing implementation and deployment.

I understand the desire for simplicity (which I ignored ;). And, I think that different moderators and different conversations deserve some flexibility on what level of participation is "good enough".

Default fillbar: red/yellow/green Generous fillbar: yellow/green/blue

The thresholds for color transitions could be fixed at first. 1) enough data to place the user in a group; 2) enough data to improve the algos significantly; 3) decreasing marginal return for the superuser.

Thoughts?

patcon commented 4 years ago

This conversation is awesome.

Just tossing my thoughts into this brainstorm:

:heart: I really feel for not wanting to subtly discourage super-voters.
:heart: I feel the progress bar and grade UIs feel a bit too linear and more concrete than is required.
:heart: I really like the simple red-yellow-green color progression.
:heart: I feel struck by the "individual" vs "collective" health division that actions might be seen on.
1. Makes me think of something that feels maslow's hierachy-ish -- something that kinda telegraphs "concern yourself with your own core/basic needs first, then turn thinking to the collective"
:bulb: Maybe we could use a system that incorporated this individual/collective framing. A scale (maybe color :) ) for a centre "the self and individual", but then literally an additional emcompassing indicator for your contribution on the collective level, that takes perhaps more effort to develop through.
1. :heart: I eel this might also get ppl thinking more on collective level while responding, which is perhaps not bad.
2. :bulb: The outer layer indicator might only display itself after individual core has had a chance to go to green (and could appear with any color, presumably, even already green for a small convo)
3. :mag: This would let the app get a sense of "who someone is" before showing them how useful they can be to the collective. Maybe someone presents themselves as being part of a minority opinion group, and so there's be time for that to become clear to app before deciding how they can support collective health (sorry if i'm misunderstanding the math or stats here!)

This sorta vibe:

A	B	C	D	E

^{(click to edit; re-link edits via File > Publish > Link...)}

Potential labels: discussion, brainstorm, client-participation

(Any idea for a title on this issue that better captures the conversation as recognizable to drive by issue reviewers? Something like Add visual indicators of "progression" for participants maybe?)

crkrenn commented 4 years ago

This issue is still a very high priority for me. I would like the option of having a "quit"/"quit for now" button next to agree/disagree/pass that would jump straight to the email signup card.

Unless someone in my town asks me specifically to run a pol.is conversation, I am not planning on running a conversation until after this is implemented.

patcon commented 4 years ago

Ah ok sounds smaller and also reasonable. Maybe there are two threads here that we could tease apart into smaller tasks/convos: (1) indicator of progress/contribution, and (2) ability to subscribe before getting through all the comments (phrasing and specific UI tbd)

Does that capture part of this convo as you see it @crkrenn @colinmegill?

colinmegill commented 4 years ago

I'm open to improving the status indicator from 'remaining'
Whatever we do should be instructed by the model. In concrete terms, this could be a confidence score (or A B C D F grade) of how much we know about them relative to the variance in the conversation. This is going to change as more comments and votes come in, and the score or grade display needs to reflect that. We really should only trigger an email to return if our confidence in our understanding of how they fit into the topology changes, not just because there are more comments or votes.
We should not rely on color as a primary mechanism of communicating progress, or add any other color to the voting interface unless absolutely necessary. This competes with primary actions (agree disagree pass), adds accessibility concerns and generally needs to be redundant to other text based strategies.
We should break out the issue of 'come back and vote later', pausing the UI, etc. This is a larger conversation, but a good one. How users relate to the conversation over time depends a lot on the context in which the conversation is embedded.

On Wed, May 6, 2020 at 12:51 PM Patrick Connolly notifications@github.com wrote:

Ah ok sounds smaller and also reasonable. Maybe there are two threads here that we could tease apart into separate tasks: (1) indicator of progress/contribution, and (2) ability to subscribe before getting through all the comments.

Does that capture part of this convo as you see it @crkrenn https://github.com/crkrenn @colinmegill https://github.com/colinmegill ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pol-is/polis-issues/issues/116#issuecomment-624764748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANQGGLE2A2WTQF2GWWVAYTRQGISHANCNFSM4GCQP4DA .

patcon commented 4 years ago

We really should only trigger an email to return if our confidence in our understanding of how they fit into the topology changes, not just because there are more comments or votes.

What happens now? Realizing I'm unclear on the current behaviour :)

We should not rely on color as a primary mechanism

:+1: ah yes, makes sense to protect those colors. For the sake of separating that, added another option (D) to my previous comment. (Edit: also added (E) )

No-harm-no-foul if not, but anyone have thoughts on whether there's any merit to the premise of communicating individual vs collective "health" in an indicator? (nevermind colors)

This surprises me, helps me understand Colin's desire for a/b user data. Thought I'd share in case it surprises you too, @crkrenn:

For both the short and long surveys, including a progress bar at the top with the percent complete [as opposed to at bottom], reduced the completion rates compared to surveys without any progress bar. -- https://www.surveymonkey.com/curiosity/progress-bars-good-bad-survey-survey-says/

(perhaps our shared sense that seeing progress motivates action is more a reflection of our being neuro-atypical..!)

colinmegill commented 4 years ago

We have long felt that saying '600 comments remain' would be demoralizing.

Oh! New idea! What if we combine 'measure of health' of conversation with 'measure of health' of individual and give a dynamic sense of how many comments remain to get to each grade?

'On average in this conversation it takes voting on 8 statements to get a C, 15 statements to get a B, and 42 statements to get an A'. But... more concise than that.

On Wed, May 6, 2020 at 4:45 PM Patrick Connolly notifications@github.com wrote:

We really should only trigger an email to return if our confidence in our understanding of how they fit into the topology changes, not just because there are more comments or votes.

What happens now? Realizing I'm unclear on the current behaviour :)

We should not rely on color as a primary mechanism

👍 ah yes, makes sense to protect those colors. For the sake of separating that, added another option (D).

No-harm-no-foul if not, but anyone have thoughts on whether there's any merit to the premise of communicating individual vs collective "health" in an indicator? (nevermind colors)

This surprises me, helps me understand Colin's desire for a/b user data. Thought I'd share in case it surprises you too, @crkrenn https://github.com/crkrenn:

For both the short and long surveys, including a progress bar at the top with the percent complete [as opposed to at bottom], reduced the completion rates compared to surveys without any progress bar. -- https://www.surveymonkey.com/curiosity/progress-bars-good-bad-survey-survey-says/

(perhaps our shared sense that seeing progress motivates action is more a reflection of our being neuro-atypical..!)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pol-is/polis-issues/issues/116#issuecomment-624881475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANQGGKTGWO3VFYM23VLBK3RQHD6NANCNFSM4GCQP4DA .

colinmegill commented 4 years ago

If we added 'There are 755 statements', it also implies 'because this system is a smart cookie, you don't need to interact with all of them but you can if you want, we'll make good use of your time'. This offers a sense of leverage to the user.

On Wed, May 6, 2020 at 5:27 PM Colin Megill colinmegill@gmail.com wrote:

We have long felt that saying '600 comments remain' would be demoralizing.

Oh! New idea! What if we combine 'measure of health' of conversation with 'measure of health' of individual and give a dynamic sense of how many comments remain to get to each grade?

'On average in this conversation it takes voting on 8 statements to get a C, 15 statements to get a B, and 42 statements to get an A'. But... more concise than that.

On Wed, May 6, 2020 at 4:45 PM Patrick Connolly notifications@github.com wrote:

We really should only trigger an email to return if our confidence in our understanding of how they fit into the topology changes, not just because there are more comments or votes.

What happens now? Realizing I'm unclear on the current behaviour :)

We should not rely on color as a primary mechanism

👍 ah yes, makes sense to protect those colors. For the sake of separating that, added another option (D).

No-harm-no-foul if not, but anyone have thoughts on whether there's any merit to the premise of communicating individual vs collective "health" in an indicator? (nevermind colors)

This surprises me, helps me understand Colin's desire for a/b user data. Thought I'd share in case it surprises you too, @crkrenn https://github.com/crkrenn:

For both the short and long surveys, including a progress bar at the top with the percent complete [as opposed to at bottom], reduced the completion rates compared to surveys without any progress bar. -- https://www.surveymonkey.com/curiosity/progress-bars-good-bad-survey-survey-says/

(perhaps our shared sense that seeing progress motivates action is more a reflection of our being neuro-atypical..!)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pol-is/polis-issues/issues/116#issuecomment-624881475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANQGGKTGWO3VFYM23VLBK3RQHD6NANCNFSM4GCQP4DA .

patcon commented 4 years ago

I would like the option of having a "quit"/"quit for now" button next to agree/disagree/pass that would jump straight to the email signup card.

Spawned new ticket for this: https://github.com/pol-is/polis-issues/issues/148

crkrenn commented 4 years ago

I agree that this is a tricky and important topic.

And, I was surprised that progress bars at the bottom are better.

Can we just "trust the user"?

Stage 0: Early users & few questions. We probably want as much data as possible, and there are probably not a lot of questions. Should encourage user to finish entire survey. Can we provide an estimate of time remaining? When do we reveal this?

Stage 1: The data seems to stabilize. The first question is "where does the user fall with respect to existing clusters?" Shooting for 95% confidence in this prediction might be one metric. The second question is "Do we understand the correlations between statements as well as we would like?" Here we are asking the user to volunteer their time to improve understanding. Maybe we also have a goal of understanding the correlation of all statements to to some confidence level (the maximum may vary depending on the self-consistency of the user and the quality of the question).

colinmegill commented 4 years ago

@metasoarous, when we implemented comment routing, we talked about A 'first attempt to understand the user in comment space', then B turn to 'understanding comments we don't already understand'. Does the math implement both A & B? Or only A? How close are we to already having what @crkrenn suggests re: the two question above, which are very similar to our conversations?

If the way we communicate 'time remaining' to the user changes, it has to be a/b tested with some users in a conversation seeing the new interface and some seeing the old. User engagement rates are presently, in our view, quite high. Example:

https://pol.is/report/r2xcn2cdbmrzjmmuuytdk [image: image.png]

On Thu, May 7, 2020 at 9:34 PM crkrenn notifications@github.com wrote:

I agree that this is a tricky and important topic.

And, I was surprised that progress bars at the bottom are better.

Can we just "trust the user"?

Stage 0: Early users & few questions. We probably want as much data as possible, and there are probably not a lot of questions. Should encourage user to finish entire survey. Can we provide an estimate of time remaining? When do we reveal this?

Stage 1: The data seems to stabilize. The first question is "where does the user fall with respect to existing clusters?" Shooting for 95% confidence in this prediction might be one metric. The second question is "Do we understand the correlations between statements as well as we would like?" Here we are asking the user to volunteer their time to improve understanding. Maybe we also have a goal of understanding the correlation of all statements to to some confidence level (the maximum may vary depending on the self-consistency of the user and the quality of the question).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pol-is/polis-issues/issues/116#issuecomment-625580474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANQGGN3XDS2ME4PNP6OTJDRQNORHANCNFSM4GCQP4DA .