AjuntamentdeBarcelona / decidim.barcelona-legacy

:warning: [DEPRECATED] Decidim Barcelona - Direct and participatory democracy web platform
GNU Affero General Public License v3.0
39 stars 10 forks source link

Front page algorithm #263

Closed xabier closed 8 years ago

xabier commented 8 years ago

Intro

The front page /proposals of the site is critical to the democratic quality of the platform. It drives the attention of citizens and has tremendous effects on the support that different proposals get. Right now the proposals with higher activity (during the day) get most attention, whereas the rest of proposals get very little, unless they have been recenttly added (the last selector tab) or they are already among the most supported (the second selector tab of proposals). As a result, the collective intelligence dynamics is very quickly biased towards a winner-takes-all or positive-feedback effect for those proposals that, at some point in the early days of the platform, got some support. In dynamical terms the early proposals, just with a few supports, get an amplifying effect. In addition, as is the case for decidim.barcelona, if a very high number of proposals are introduced simultaneously (e.g. the first 1000 proposals made by city-hall, or a dozen proposals made in a citizen meeting, or 200 proposals being included in the very same day from a slowly collected number of proposals made in a district council) the effects are even worst: the "most active today" category gets collapsed and the proposal will get lost.

The problem: What do we want to maximize with the frontpage?

Before we solve the problem we need to define it carefully. What exactly do we want to maxime or balance with the front page? What is the goal for a good front page of proposal in a massive, citizen driven, planning project for the city-council?

The equality constraint is the most complex to satisfy. Consesus proposals, or proposals with many supports will outstands almost automatically due to two main factors: a) the most supported selection tab will be used, people is curious about which proposals got more support, so this tab helps dragging attention to the most supported ones, b) even in a random presentation of proposals the fact that the number of support is very visible already attracts attentions, it is a natural bias of humans to pay more attention to those items that have already attracted more attention, in a random scan of proposal those with higher support already attract user attention. So what need a most urgent solution is the problem of how to distribute an even attention to all proposals.

Random front-page: the base-line solution

Proposals could simply be shown on the front-page randomly. I call it a "base-line solution" because it is the easiest way to provide equal attention to all proposals and been so simple any other algorithm should do better than this. But it is far from perfect. Proposals that are made early on will have more attention than those made latter on. Despite this fact alternative algorithms might not perform much better, or only by very complex methods.

Random-balanced/equilibrated selection

This solution consists on ordering the proposals according to how often they have been seen by users with the right to vote (registered and verified users). The front-page will then be composed by a "random" selection of proposals with a probability that is inversely proportional to the number of times it has been already shown and and even number of proposals per topic. The order of appearence on the list of proposals of the front-page should also be considered. I have no time to work out the maths for this sorting algorithm, but it seems to me that, with relativelly little added complexity, it can outperfom the purely random base-line solution above.

elaragon commented 8 years ago

I agree with Xabier that the front page /proposals is critical since it drives the attention and, therefore, supporting.

First of all, I think we should reflect on this particular process. As Xabier said: What do we want to maximize with the frontpage?

In the elaboration of PAM/PAD, the publication date of a proposal should not be much relevant. A common mistake is to think about sorting algorithms from typical collaborative filtering platforms (e.g. Reddit, Digg-Meneame, etc.). Such platforms are devised for filtering news which, by definition, are not relevant after some hours/days. In contrast, proposals for PAM/PAD should not be prized/punished based on when they were published: all of them require (and deserve) attention during this process.

This is similar to "We want to maximize the opportunity of all proposals to be considered (read and potentially supported) by citizens, with equal initial opportunities for each proposal". The solution proposed by Xabier ("random" selection of proposals with probability that is inversely proportional to the number of times is has been already shown) includes two interesting ideas:

However, the suggested algorithm reproduces the winner-takes-all problem. Proposals which are clicked in the first executions of the sorting algorithm will likely appear in future executions, promoting inequality in long term. For this reason, I would rather suggest another feature: received supports in the last X hours/days/executions. The key of this feature is that is constrained by the number of users in the platform. Users are able to click as much as they want in a platform but they can only give support once. A proposal could become a hot topic and takes all the attention but its visibility will drop once it reaches its supporting limit. In conclusion:

  1. Random promotes equality
  2. Equality is balanced by the interest of each proposal from the community
  3. Winner-takes-all effect is removed.

In addition, I would like to introduce another dimension: the promotion of deliberative democracy. As far as I understood, the discussion of proposal was conceived to promote deliberation. Nevertheless, this feature has no tangible value in the platform. Deliberation would be partially promote if features from the discussion were considered by the sorting algorithm, e.g:

Obviously, if new features are considered by a new version of the algorithm, these should be notified to citizens for transparency issues and to promote discussion

xabier commented 8 years ago

I think you misundertood or I did not explain it properly: the propability of a proposal to be selected for the front-page is inversely proportional to the number of times it has previously been shown for users with voting rights, thus avoiding winner-takes-all dynamics and preventing unequality.

xabier commented 8 years ago

I very much like the idea of promoting deliberation, but maybe not for the front-page algorithm. It would be difficult to avoid having very few propposals with an increasing number of comments. To promote deliberation I think it is more usefull to provide positive reinforcement for users to engage in discussions (this could be done with some gamification).

elaragon commented 8 years ago

I totally misunderstood you (although your explanation was clear). I am not so sure if perfect equality promotes user engagement since users will be exposed by many proposals with little interest to them (but I am not sure about this idea...)

xabier commented 8 years ago

You are right with user engagement. But users can use topic/district selectors from the left menu to match their areas of interest. Alternatively we could ask the users to mark their preferences on their user page and thus the filtering could be authomatically done.

andreslucena commented 8 years ago

I'm testing @xabier first proposed solution (Random front-page: the base-line solution) as a short term fix on #264 . It's already an option on Consul.

xabier commented 8 years ago

If the user has the option to select areas/districts of interest on their profile page, then this preferences could be automatically marked or transferred to their view of the front-page. The user could, at all times, check or uncheck the filters of the front page, but those are configured by default with their user preferences from their profile.

MiguelAguilera commented 8 years ago

I think it is interesting giving some thought to the perspective Xabier is proposing. I agree with the three objectives (equal initial opportunities, compromise between consensus and diversity and wide scope). With that on mind, I think the proposal is interesting, but I was wondering about some potential problems of the proposals of random ordering above (I would use random-uniformly distributed and random-inversely proportional to the number of appearances rather than "pseudo-random", since all are random).

I would say that one of the most important features of reddit-like algorithms is collaborative filtering. Collaborative filtering focus the collective attention, drive the debates, etc. A random-uniform or random-inversely proportional algorithm would immediately disperse this attention over hundreds of proposals. This is not necessarily bad (although if the proposal/user ratio is too small it may "empty" the debates), but it may drive the attention to proposals of very low quality or proposals that are just noise or trolling (which might be a considerable proportion of the system). Or even worse, it may completely disperse collective attention. We want the proposals to have identical initial opportunities, but it is not useful to have interesting proposals (which will be a small part) given the same weight in the system than noise (I know there is the most supported section tab, but that only works for a small number of proposals).

A second related problem is that the number of proposals subject to debate would continuously rise. The more proposals the platform receives, the less opportunities for any proposal to receive any attention (worse, the amount of noise in the system would continuously increase!). The random algorithm with inversely proportional probabilities would solve that problem at the cost of "forgetting" good proposals that have very small chances of appearing if new proposals keep entering the system.

An intelligent system should be able to remember what is useful and forget what is not. E.g. if a proposal has appeared 200 times and received 0 votes, maybe we do not want it to keep appearing instead of other proposals with higher interest. Or maybe it is more interesting to drive the debate to proposals with many supports and intense debates that have appeared 200 times, rather than proposals that have appeared 20 times with no supports or comments so far. The idea behind this is that it is relatively easy to discard a proposal that does not receive attention, but it is more complicated to compare the consensus over two proposals with more supports, and the intelligence of the system should be directed to this task.

An easy solution might be to weight the probabilities of random selection with the ratio of supports and comments / times shown in the frontpage. This should not be a winner-takes-all mechanism, but rather a mechanism for removing noise from the system or forgetting old and not interesting proposals (maybe something somewhat similar to Apgree's algorithm). I would try to equilibrate the mechanism to combine in the frontpage new proposals that have not appeared much and older proposals with active debates. The mechanism could be similar to the one I was trying here substituting time with other measure.

I think it is not a bad idea to start from the random-uniform selection algorithm, but I would be concerned to the effects of permanently functioning with that algorithm.

PS: One last though. The idea of a random order (either uniform or weighting the probabilites in some way) totally rules out time as a factor of the system. This is not necessarily bad, or it can be even good for the objective of the PAM (as it allows to discuss a wide range of topics). However, online debates are sometimes subjects to strong dynamics with great potential of mobilization. For example, if public transport in the city is a hot topic during a particular week, do we really want to completely balance the presence of all topics of conversation? Although time should not dominate everything as in reddit-like algorithms, maybe it is interesting not forgetting it completely.

Alotria commented 8 years ago

I am short of time and could only make a cursory reading. I will re-read and try something more precise during the weekend. That said, my two cents:

1-is it computationally too costly to present various orderings or front page types alternatively, in a random fashion? Otherwise, if I enter once, I get a front page with the most voted proposals. If I enter again, I get the random order. If enter again I get the most voted today. For the user, that should not seem much weirder than the current random option, and after a few tries they will figure out what is going on (we could underline the "ordering selection tab" so that it is easy to see what is going on). In this option, depending on how data are registered, we may disaggregate them and see what kinds of effects we have with each ordering, as well as seeing the total effect of meta-equalizing the orders.

2-doing this we could include options of front pages that we otherwise discard by default, such as a front page only with proposals with few votes. This option might partly fix something that Xabier comments above: the fact that people tend to focus on the options that have the most votes, even if the front page presents proposals at random. This "few-votes only" front page would mitigate winner-takes all dynamics within, at least, one of the potential front pages.

iacoco commented 8 years ago

Hi fellows! In Madrid we have been thinking about this algorithm. We have some ideas about this issue: 1.Maximize the opportunity of all proposals. As @xabier said. But I think that means not giving always the right to appear but just giving all proposals the same amount of views until they have been evaluated enough times to know if they are good proposals.

  1. Use Collaborative Filtering. How do we now if they are good proposals? We need to have a quality index (defining quality as the capacity to add supports). This could be [QI = number of supports / number of views by verified users]. So if one proposal has been seen by 100 verified users and supported 50 times it has a QI = 50/100 =1/2. The ideal proposal is QI = 1(supported by everyone who viewed it). The actual problem is that we don´t have "number of views by verified users" yet available. So as @MiguelAguilera says we need some kind of rounds (like Appgree) to make good proposals grow continuously. We can make this dynamic playing with the QI. For example you can still show proposals which are enough good to be shown again. That means, for example, proposals which maintain a high QI on each round ( a round could be 100 views, or better: 1rst round - 100 views, 2nd - 400views, 3rd - 1000views, etc.). We will try to use this. Is already made and incorporates these ideas: Use Thompson sampling with the beta posterior. See http://simplemlhacks.blogspot.com.es/2013/04/reddits-best-comment-scoring-algorithm.html
  2. Avoid herding effect. We think we can solve this by not showing the amounts of supports one proposal has (for an concrete user) until it has been supported by this user.
  3. Reward deliberation. As @elaragon suggested. This is very important when we are focus on living proposals that could be changed (this is not happening in Madrid yet, but we will work on this soon). So if there is a richer debate and pull requests are being considered it could be a good idea to add this deliberative contribution to the QI.

For mor info: https://betademic.hackpad.com/Algoritmos-de-mostrado-y-ordenacin-DYqlhg2gXcW

xabier commented 8 years ago

I am sympathetic to the view (expressed by @MiguelAguilera @elaragon @Alotria and @iacoco) that the platform should be used for collaborative filtering. I very much like the QI parameter proposed by @iacoco . But it is important to note the difference and function of dedicim.barcelona over dedice.madrid.es . The Barcelona usage does not require proposals to reach a high volume of supports, neither do proposals have to compete too much (e.g. only 5 will be choosen). I fear that collaborative filtering techniques might leave too many good proposals behind. For instance, those that are not noise/rubish but whose quality or significance only a few users can distinguish (because it is not a hot topic or because it demands some background knowledge to be appreciated). What is noise and what is not is not something that the front-page, right now, should discriminate in advance. Maybe, in a latter stage, or for a more general scenario, and whenever a few proposal are required to reach a given threshold or need a lot of support to gain political weight, should filtering mechanisms be introduced. To say it differently, the front-page could be, at least temporarily, the departure point for collaborative filtering, not a page displying the results; other tabs (most voted and most active, for example) are already good at doing it.

xabier commented 8 years ago

I am starting to change my mind with this: https://decidim.barcelona/proposals/hacer-mas-util-esta-iniciativa :-)

Although, thinking about it carefully, this proposal is actully getting overvalued because it is shown the first on the list of most active today, in a random frontpage it would have attracted so much attention :smile:

xabier commented 8 years ago

Intersting resource post on these matters: https://blog.agoravoting.org/index.php/2016/02/15/reddit-style-filtering-for-e-democracy/

josepjaume commented 8 years ago

Closing this as the issue has been kind of solved. Maybe we can reopen the issue in the future to further discuss it.