Bukit-Vista / roadmap

0 stars 0 forks source link

#2 iteration / Thea should adopt communication principle and capable to detect human emotion #62

Closed krisnaBukitVista closed 1 month ago

krisnaBukitVista commented 2 months ago

Description

This project aims to address several critical issues in our system related to mirroring skills, technical bugs, categorization, emotion detection, and conversation loading. The goal is to enhance overall performance and user experience by implementing targeted solutions and measuring success against defined metrics.

Problem

Solution

Measurement metrics

SLA

Evaluation Result

Evaluation Report for Thea – Second Iteration We have successfully met the SLA for Thea in this second iteration. Here's a breakdown of the performance:

Categorization Accuracy:

Out of 20 samples, we encountered 3 cases of miscategorization, resulting in an accuracy of approximately 85%. This occurred due to the need for a dedicated category to improve classification precision.

Emotion Classification:

We recorded 2 false classifications, bringing the accuracy to around 90%. This highlights the need for ongoing refinement in emotion detection models.

Completeness:

Only 1 error was observed in completeness, achieving an accuracy of about 95%. This confirms that the completeness method is significantly more reliable than requiring Thea to mirror conversations in all cases.

Technical Issues

However, there are still two technical bugs affecting the system:

I have already created a backlog for developers to address these issues before the milestone ends.

Prompting Issues

Since the prompting issue is minor, I have delegated it to AI assistance, and the prompt will be modified by tomorrow.

Vidiskiu commented 1 month ago
Overall Point: 4.5

Functional Complexity: 1.5

The issue involves enhancing mirroring skills and categorization, which requires a good understanding of conversation flows and additional conditions to handle specific partner queries.

Technical Complexity: 1

Addressing technical bugs suggests moderate technical work, though it does not imply major architectural changes. Improving emotion detection algorithms is technically challenging but doesn't seem to require major overhauls.

UI/UX Complexity and Impact: 0

There is no specific UI/UX work item mentioned in the issue description.

Testing and Quality Assurance: 1

The solution includes various aspects like training algorithms and resolving bugs, requiring a comprehensive test strategy for evaluation score improvements and the effectiveness of the categorization process.

Risk and Dependencies: 1

The work depends on the successful enhancement of training algorithms and resolving unknown bugs, which introduces uncertainty and potential high risk if bugs are critical.