Description

This project aims to address several critical issues in our system related to mirroring skills, technical bugs, categorization, emotion detection, and conversation loading. The goal is to enhance overall performance and user experience by implementing targeted solutions and measuring success against defined metrics.

Problem

Mirroring Skill: Currently achieving only a 50% satisfactory score.
Technical Bugs: Two technical bugs have been detected.
Categorization: New categories are needed to handle specific queries from partners.

Solution

Mirroring Skill Improvement: Enhance training and algorithms to increase the satisfactory evaluation score.
Bug Fixes: Address and resolve the two identified technical bugs.
Categorization Update: Develop and implement new categories to improve query handling for partners.

Measurement metrics

Mirroring Skill: Achieve a satisfactory evaluation score from evaluator
Technical Bugs: Eliminate all detected technical bugs.
Categorization Accuracy: Track and improve the number of accurate categorizations.

SLA

Mirroring Skill: Achieve at least 75% satisfactory evaluation from evaluator
Technical Bugs: Eliminate all detected technical bugs.
Categorization Accuracy: Attain 80% accuracy in new categorization.

Evaluation Result

Evaluation Report for Thea – Second Iteration We have successfully met the SLA for Thea in this second iteration. Here's a breakdown of the performance:

Categorization Accuracy:

Out of 20 samples, we encountered 3 cases of miscategorization, resulting in an accuracy of approximately 85%. This occurred due to the need for a dedicated category to improve classification precision.

Emotion Classification:

We recorded 2 false classifications, bringing the accuracy to around 90%. This highlights the need for ongoing refinement in emotion detection models.

Completeness:

Only 1 error was observed in completeness, achieving an accuracy of about 95%. This confirms that the completeness method is significantly more reliable than requiring Thea to mirror conversations in all cases.

Technical Issues

However, there are still two technical bugs affecting the system:

Failed WhatsApp Service
Failed Process Agent resulting in a 500 server error.

I have already created a backlog for developers to address these issues before the milestone ends.

Prompting Issues

Since the prompting issue is minor, I have delegated it to AI assistance, and the prompt will be modified by tomorrow.

Overall Point: 4.5

Functional Complexity: 1.5

The issue involves enhancing mirroring skills and categorization, which requires a good understanding of conversation flows and additional conditions to handle specific partner queries.

Technical Complexity: 1

Addressing technical bugs suggests moderate technical work, though it does not imply major architectural changes. Improving emotion detection algorithms is technically challenging but doesn't seem to require major overhauls.

UI/UX Complexity and Impact: 0

There is no specific UI/UX work item mentioned in the issue description.

Testing and Quality Assurance: 1

The solution includes various aspects like training algorithms and resolving bugs, requiring a comprehensive test strategy for evaluation score improvements and the effectiveness of the categorization process.

Risk and Dependencies: 1

The work depends on the successful enhancement of training algorithms and resolving unknown bugs, which introduces uncertainty and potential high risk if bugs are critical.

Bukit-Vista / roadmap

#2 iteration / Thea should adopt communication principle and capable to detect human emotion #62