Qwlouse / Findeco

GNU General Public License v3.0
7 stars 4 forks source link

Graph data "default" #20

Open Qwlouse opened 11 years ago

Qwlouse commented 11 years ago

Select the most interesting nodes in the current slot, but hide old versions which aren't supported by the authors any more. Use clustering information of the node markings to find the nodes which are preferred by fractions of users. Limit the number of clusters to a maximum of 5. Gather the sources and the derivates connected the interesting clusters but limit the derivates to each of the clusters to the two most favored derivates. Hide all spam.

justelex commented 11 years ago

pinae commented 10 days ago

For the default graph data we consider the graph without spam (which has to be filtered first). From all the data we have for the nodes in that slot we want to calculate the (up to) 5 groups of people who propably head for the same goals. The groups unfollow the proposals of all the other groups and leave 'con' arguments there. For all the proposals they like they follow (even repeatedly for different revisions) and add 'pro' arguments.

A user can do the following things which can be considered to find the groups:

The user can follow a proposal
The user can unfollow a proposal which was derived from a proposal of the own group but was changed to match the needs of the other group
The user can create a new revision of a proposal of the group to make the text more convincing
The user can add a 'pro' argument to a proposal of the own group
The user can add a 'con' argument to a proposal his group doesn't like
The user can follow a 'pro' argument for one of the proposals of the own group
The user can unfollow arguments which are changed by members of other groups All this can be done for substructures.

We want to find the five sets of proposals which belong to the five most distinguishable groups of users. So the first step would be to find the five groups and when they are found to match the proposals to the groups and select the most relevant proposal in each set.

I propose doing some statistics. As we are searching for diversity we want to group users who follow some proposals while explicitly unfollowing others. If there are also users who follow reversedly we have found proposals which polarize. We can use such findings to calculate a diversity score for a group by summing up the diversity to every other group. Since we can trivially start by putting every user which interacted in that slot in his own group, we can generate a list of groups which have a diversity score. Since groups with low diversity are not interesting we can eliminate for every step the group with the lowest diversity. When eliminating a group we have to search for every user in that group for a new group which fits his opinion best which means having a low diversity compared to the other groups. We add that user to the low-diversity-goup and update the score of this group because the actions of this user now add to the diversity definition of the group. We repeat this until we have reduced the groups to only five (maybe we could introduce a threshold so that the group count can be further reduced if a consensus is reached).

After we clustered the users we can calculate the sets by determining which group likes (follows, pro arguments and pro argument follows) a proposal the most. After that step we have not more than 5 sets of proposals.

In each set of proposals we have to find the most relevant one. This is the proposal with the most follows which can be weighted with the age of the proposal to prevent the system to propose conservatism.

After that we have not more than five nodes in the slot which can be used to gather the related nodes.

Maybe there are better ways to cluster the users and the proposals by utilizing machine learning but that's something @Qwlouse has to answer.