To handle cases where new topics from a message are similar to existing ones in the channel without creating duplicates, we can use a topic similarity threshold to decide if the new topic should merge with an existing topic or be created as a new one. Here’s a proposed approach:
Proposed Steps
Compute Topic Similarity:
When BERT identifies a new topic in a message, compare this new topic’s semantic_vector with each existing topic in the channel’s ASSOCIATED_WITH relationships.
Use a similarity metric, such as cosine similarity, between the new topic’s vector and each existing topic’s vector.
Set a Similarity Threshold:
Define a similarity threshold, e.g., 0.8, above which the new topic is considered “similar enough” to an existing topic. This threshold can be adjusted based on testing.
Merge or Create Logic:
If Similarity is Above Threshold:
Merge the new topic with the existing topic that has the highest similarity score.
Update the existing topic’s overall_score using the amplify_score function based on the relevance of the new topic in the message.
If Similarity is Below Threshold for All Existing Topics:
Treat the new topic as distinct, create a new Topic node, and establish the ASSOCIATED_WITH relationship for tracking in this channel.
Optional: Store Relatedness Data:
For transparency and future adjustments, record similarity data in the RELATED_TO relationship between topics. This way, if similar topics keep emerging, you can track these relationships for potential reorganization or clustering later.
Example Flow:
Analyze New Message:
A new topic appears in the message with a semantic_vector.
Similarity Comparison:
Compute cosine similarity between this new topic’s semantic_vector and each existing topic in the channel.
Apply Threshold Decision:
Above Threshold (e.g., 0.8): Update the most similar existing topic’s score using amplify_score.
Below Threshold: Create a new topic entry and start tracking it as a distinct topic.
To handle cases where new topics from a message are similar to existing ones in the channel without creating duplicates, we can use a topic similarity threshold to decide if the new topic should merge with an existing topic or be created as a new one. Here’s a proposed approach:
Proposed Steps
Compute Topic Similarity:
semantic_vector
with each existing topic in the channel’sASSOCIATED_WITH
relationships.Set a Similarity Threshold:
Merge or Create Logic:
overall_score
using theamplify_score
function based on the relevance of the new topic in the message.Topic
node, and establish theASSOCIATED_WITH
relationship for tracking in this channel.Optional: Store Relatedness Data:
RELATED_TO
relationship between topics. This way, if similar topics keep emerging, you can track these relationships for potential reorganization or clustering later.Example Flow:
Analyze New Message:
semantic_vector
.Similarity Comparison:
semantic_vector
and each existing topic in the channel.Apply Threshold Decision:
amplify_score
.