BoredLabsHQ / Concord

Concord is an open-source AI plugin designed to connect community members to relevant conversations
GNU General Public License v3.0
2 stars 3 forks source link

[DB] Update topic schema #35

Closed sajz closed 2 weeks ago

sajz commented 2 weeks ago

The key differences focus on adding topic_embedding, moving overall_score to a separate relationship table, structuring a more flexible, non-channel-specific topic model with channel-specific mappings, and updating keywords to include weights for each term.

Here's the revised schema including weights for topic_keywords:


To create a structured database format where topics are non-channel-specific but have relationships with channels where they are present, you can design your schema as follows:

Database Format for Topics and Embeddings

  1. Topics Table

    • topic_id (Primary Key): Unique identifier for each topic.
    • topic_name: Descriptive name or label of the topic (e.g., "AI and Machine Learning").
    • topic_keywords: Array of representative keywords with their associated weights (e.g., [{"term": "AI", "weight": 0.35}, {"term": "neural networks", "weight": 0.28}, {"term": "deep learning", "weight": 0.22}]).
    • topic_embedding: Numeric vector stored as an array of floats (e.g., [0.12, -0.34, 0.56, ..., 0.78]).
    • created_at: Timestamp indicating when the topic was created.
    • updated_at: Timestamp indicating the last update to the topic.

    Example Record:

    {
     "topic_id": 1,
     "topic_name": "AI and Machine Learning",
     "topic_keywords": [
       {"term": "AI", "weight": 0.35},
       {"term": "neural networks", "weight": 0.28},
       {"term": "deep learning", "weight": 0.22}
     ],
     "topic_embedding": [0.12, -0.34, 0.56, 0.78],
     "created_at": "2024-11-07T10:30:00Z",
     "updated_at": "2024-11-07T10:30:00Z"
    }
  2. Channel-Topic Relationships Table

    • relationship_id (Primary Key): Unique identifier for the relationship.
    • channel_id: Identifier for the channel (foreign key referencing a Channels table).
    • topic_id: Identifier for the topic (foreign key referencing the Topics table).
    • topic_score: Overall score indicating the strength of the relationship between the topic and the channel.
    • last_updated: Timestamp for when the score or relationship was last updated.

    Example Record:

    {
     "relationship_id": 1001,
     "channel_id": "channel_abc123",
     "topic_id": 1,
     "topic_score": 0.85,
     "last_updated": "2024-11-07T10:45:00Z"
    }

Explanation:

Benefits: