Aggregate-Intellect / sherpa

https://sherpa-ai.readthedocs.io/
Other
165 stars 46 forks source link

Retainning information for key insights for blogger demo #391

Open andytai7 opened 5 months ago

andytai7 commented 5 months ago

PROBLEM In the blogger demo, while the insights extraction (transcript2insights) seems effective, the create_blueprint function using the LLM call appears to lose information. There are two potential solutions proposed: The first solution suggests removing all LLM calls when creating the blueprint, whereas the second solution suggests retaining the first and last LLM calls.

SOLUTION Propose to overhaul LLM integration by:

Removing all LLM calls in transcript2insights & create_blueprint.
Retaining the LLM call solely for the blog generation phase.

Detailed Steps

Clustering and Blueprint Generation
    Cluster the key insights.
    Create individual blueprints for each cluster of key insights.
    Generate a main blueprint integrating all insights.

Validation
    Validate the new clustering model to ensure no loss of information.

ALTERNATIVES An alternative approach is to modify only the create_blueprint process rather than both functions:

Transcript2Insights Execution
    Run transcript2insights normally with the first LLM call intact.

Rule-Based Clustering (replacing create_blueprint)
    Cluster the key insights into thematic clusters.

Multiple LLM Calls for Blueprint Creation
    Use multiple LLM calls to:
        Create detailed blueprints for each cluster.
        Develop a comprehensive main blueprint.

Validation
    Confirm the effectiveness and information retention of the revised clustering model.

OTHER INFO Add any other context or screenshots about the feature request here.

andytai7 commented 5 months ago

Title: Proposal for Comparing Two Methodologies in LLM Integration

Description: We propose to evaluate two distinct methodologies to enhance our LLM integration. The goal is to determine which method better supports our system's efficiency and output quality.

Method 1: Update and Retain Key Insights

Lead by: Andy Tai and Kulwant Yadav
Objective: Replace existing LLM calls to improve information retention and compatibility with our blog generation framework.
Tasks:
    Replace the create_blueprint LLM call with a GPT-4 call.
    Develop a new prompt that better retains key insights.
    Create a tree-based model to capture key insights that might be lost with the new prompt.
    Modify and test the blog generator to assess compatibility with the new blueprint format.

Method 2: Overhaul LLM Integration

Lead by: Amirabbas Tabatabaei
Objective: Streamline LLM usage to focus on blog generation, enhancing clarity and reducing redundancy.
Tasks:
    Remove all LLM calls within transcript2insights and create_blueprint.
    Retain LLM usage exclusively for the blog generation phase.
    Implement detailed steps for:
        Clustering and Blueprint Generation:
            Cluster key insights for detailed analysis.
            Create individual blueprints for each insight cluster.
            Generate a main blueprint integrating all insights.
        Validation:
            Validate the new clustering model to ensure there is no loss of critical information.