Shaulbm / moovNowMVP

0 stars 0 forks source link

Chat Data Analytics and Extraction #735

Open Shaulbm opened 1 month ago

Shaulbm commented 1 month ago

Check these: https://www.youtube.com/watch?v=a8hMgIcUEnE https://www.youtube.com/watch?v=UtwDAge75Ag

We want to analyze the conversation done by our users with Claire.

For that we need to annonimize the data and to classify it. This extracted data should be then analyzed.

Offline extraction: For each relevant user - Classify full chats and messages. Pass on the given prompt, and the organizational data (without Claro's model).

For whole chats (chatId) - What we want to understand - No names should be here - only roles:

  1. What was the chat about - classification - if it was on several different issues, have a list of issues.
  2. Who it was about - self, manager, subordinate, peer - if there were different people, list it as roles.
  3. overall user satisfaction
  4. user sentiment

For each user message (chatId, messageId) in the chat:

  1. What was the message about - classification
  2. Who is it about (peer, manager, subordinate, self) - can be multiple (2 subordinates, 2 peers etc.)
  3. user sentiment

For each llm message (chatId, messageId):

  1. If possible, estimate by the next messages whether the user was happy from the response
  2. thinks what else.

Save data in a specific mongoDB Chat Messages

Save to a specific DB - two different collections - chats and messages

user message: timestamp messageType (user / assistant) messageId chatId userId orgId wordCount charCount promptVersion (Future) sentiment (negative, positive, neutral) is it a resoponse to what claire said - we want to check continuity (yes / no) What is the response from the user (more data - how to/ different approach / refine / reject the given solution): -request for more data - how to - the user wants to get more data from Claire -request for different approach or solution

essence (summary of the user message without any id details) essence_type (possilbe types for the message - TBD - contain "Other" to allow for new types - if other let the LLM create a new type and add it to the possibles list) trget (self, manager, subordinate, peer, organization, other) language

bot message: timestamp messageType (user / assistant) messageId chatId userId orgId wordCount charCount promptVersion (Future) llmVersion (?) essence (?) essenceType (?) reposndingAccordingToThePrompt(? - need to think about it)

chat: firstMessageTimestamp lastMessageTimestamp chatId userId orgId messageCount wordCount (?) sentiment (positive / negative / neutral) engagementLevel (how much the user engaged in the conversation and the person asked for knowledge, replied to questions) - (very low, low, medium, high, very high) frustrationLevel (the user is arguing about the provided data, trying to fix the assistant, resentful to proposed solutions) - (very low, low, medium, high, very high) acceptenceLevel - how much the user accepted the offered solutions (N/A, low-high) essense - summary of the chat essence_type - multiple selection - (possilbe types for the message - TBD - contain "Other" to allow for new types - if other let the LLM create a new type and add it to the possibles list) targets - multiple selection - (self, manager, subordinate, peer, organization, other) language - multiple selection ()

NoamRivlin commented 2 weeks ago

watched this video it seems very relevant for labeling/tagging from NL and uses pydantic and baseModel for validation, with code examples etc. right up our alley I think, and it's the most up-to-date video from what I've seen