GatorIncubator / gatorgrouper

:busts_in_silhouette: Automated Group Formation Tool Enabling Effective Team Work
GNU General Public License v3.0
20 stars 20 forks source link

Represent grouping as a Graph Partition Problem and implement a Kernighan-Lin heuristic #190

Closed aubreypc closed 5 years ago

aubreypc commented 5 years ago

Background Info

In the graph partition problem, we must partition the vertex set V of a graph into k subsets such that that the edge-cut (total weight of all edges which go between different subsets) is minimized. In our case, each vertex in the graph would represent a particular student, and for any two vertices u and v we create an edge e between them if and only if those two students are allowed to be in the same group. Then we assign a weight to e representing the compatibility of the two students. With this setup, the edge-cut represents the total "unused compatibility" of the grouping, i.e. the sum of all compatibility scores for students who aren't in the same group -- we want to minimize this.

While graph partitioning is NP-hard, it has some fairly fast heuristics. Kernighan-Lin is one such option which applies in the case of bisecting the graph into 2 groups (k = 2). By recursively applying Kernighan-Lin, we can partition into k = 2^n groups for any n >= 1 in roughly O(log2(k) * |V|^2) time. However, other heuristics are available which are similar to Kernighan-Lin while generalizing to a k-way partition rather than a bisection, and perhaps without the logarithmic increase in complexity.

Prototype implementation

We should begin by getting recursive K-L on one objective working before moving on to multi-objective optimization. This will involve:

Alex-Yarkosky commented 5 years ago

It should be kept in mind that if non-negotiable conflicts are included by having situations where there are not edges between two individuals, it should first be checked that groups can be made to the user's specifications before any algorithm is utilized. If they cannot be made, the user should be asked to remove some or at least one of theses conflicts and could instead assign a low compatability score to the individuals. This would allow for the user to always get groups that fit the specifications they gave to our tool.

Michionlion commented 5 years ago

Additionally, there is the problem of calculating these compatibility scores -- if they are given by the user, the user must do a difficult calculation, and it is nearly impossible to get a compatibility score from some given characteristics of skill, since the compatibility score is dependent on the entire group, not just two students. I think this approach may be fundamentally limited by these problems.

aubreypc commented 5 years ago

@Alex-Yarkosky Good point! To ensure a grouping exists, I think it should suffice to check if each vertex u in the graph has degree(u) >= k, where k is the number of students per group, but I'm not certain about this -- would need to prove / find a proof.

@Michionlion Even if the compatibility scores are local to each pair of students, Kernighan-Lin should still result in a grouping which is reasonably good globally since it minimizes the number of highly compatible pairs of people which are not grouped together.

Michionlion commented 5 years ago

@aubreypc the problem is actually generating those compatibility scores, however. It's very difficult to imagine how a professor might choose them, and they are not really possible to calculate from other information easily. I may not have your insight, however.

Alex-Yarkosky commented 5 years ago

@Michionlion @aubreypc I think the best way to think about compatibility scores, assuming higher is better, is that everyone starts off at 5, assuming a 1 to 5 scale, meaning that the professor would have no reason to believe those individuals would have any problem working together. If the professor had any reason to believe students would have a problem or difficulty working together, then they could change to any other number on the scale. It may also be better to think of the compatability score in terms of conflict resolution or in other words, 5 means the students have no conflict and 1 could mean the students have conflicts but can work together with the rest in between being other various degrees of problematics between the two individuals. Hope this helps when thinking about the compatability scores!

aubreypc commented 5 years ago

@Michionlion Yeah, it's definitely hard to find good measures of compatibility. A couple ideas:

@gkapfham Do you have any suggestions for how compatibility between pairs of students could be measured?

aubreypc commented 5 years ago

Before #229 can be merged, it needs test cases -- @finneyj2 has offered to work on these.