Topic Change Detection - Githubissues

vdpappu commented 4 years ago

Let's discuss the potential next steps for improving topic change detection and update the activities here.

reaganrewop commented 4 years ago

The ideal goal for the topic change detection is to, slice the meeting into multiple partition where each partition carries enough information to redeem itself as a discussion.

The following needs to be addressed to achieve this:

cosine similarity as solo edge weights.
A mixture of topics in a single segment (not an ideal case.)
what is the factors for grouping of segments (currently it's the order of the segments by which they were spoken at)
pruning of the edges. How do we prune the respective irrelevant edge?
filler sentences in the segment causing overlapping groups.

going with our current implementation, I made few extra implementations to try to fix the last issue.

handling spillover sentences was rather important because it caused many overlapping groups. I am currently handling this by checking for duplicate segments across the communities and if found, I remove them if majority of the sentences from that segment are placed in a different community.

doing this increased the accuracy of the communities by a large amount and no overlapping groups would be formed.

reaganrewop commented 4 years ago

To improve the current communities approach (the one on staging) or to be precise, to understand what is best for communities, I went through few papers and methods to understand how effective it can be. Based on that I made few changes to the algorithm.

Instead of fully connected network, we connect two sentences only if they are either from same segment or from the next. This helps to reduce cosine similarity noise.
Normalizing the graph is now a bit different. we compute local normalization score for each node and then for the overlapping edge values, we average the score.
community approach relies on self-loops, so that is also added.
Based on this paper https://arxiv.org/pdf/0812.1770.pdf , we add another resolution parameter t, which helps to control the stability of the network.

Based on the validation set, the accuracy increased form 47 percent to 79 percent.

etherlabsio / ai-engine

Topic Change Detection #134