Closed ghost closed 9 months ago
TASKS: This first sprint we will focus on:
TIMELINE: Timeline for 3 tasks:
Acceptance Criteria: Does it return a category? Does it return results consistent with testing parameters (given some level of accuracy on our test set, I should look for an accuracy of more than 50% on the first model.)
This is the associated ML repo JARVIS Email Sorter Training
This the new branch name. It seems I can't update it under the development tab.
Due to computational requirements I will expand on "Put in ML sorter, depending on computational cost, either a server connection or locally on the device: By 2/19" I have opted for server connection. I will make another associated JARVIS server repo, and the server side code will live there. Basically this works by having an API endpoint on the server and and API endpoint in the app.
Here is the link to the server code JARVIS-server
I have completely implemented this, but since the environmental variables security feature is not merged in yet. waiting confirmation from @haohuazheng3 , I have stored the key in a secure location. I can verify the API connection works via tests when the API-key is filled out.
noted two issues 1: you can't send so many requests to hugging faces within an hour. <100. 2.There is a token limit to distilbert (512) so inputs can only be 512 words. 2a. A 'cheap' way to do this would be to truncate all results to 512 at max, 2b. split longer emails into pieces that don't exceed 512, then average the results to get the correct category. For simply having something by deadline I will do 2a, and make the tests on 10 emails rather than 100. In the future I can toss 9 dollars to hugging faces to have more API calls.
This PBI has been merged into main. There are some modifications that should be made in the future regarding 1> more sophisticated handling of large emails (>512 words) 2> further refinement of the sorting model 3> better responses when hitting API 503 error, (too many API calls within 1 hour). 4> integration with "Collect Emails" and "Summary" 5> implement encryption and decryption of emails. So encrypt before we send email, email gets decrypted at the server. We don't need to worry about encrypting and decrypting what comes FROM the server because it only sends back which categories are most likely.
e.g.: server response
[
[
{
"label": "LABEL_0",
"score": 0.9175021052360535
},
{
"label": "LABEL_5",
"score": 0.03506850823760033
},
{
"label": "LABEL_3",
"score": 0.0159841850399971
},
{
"label": "LABEL_2",
"score": 0.01139953825622797
},
{
"label": "LABEL_7",
"score": 0.006237142253667116
},
{
"label": "LABEL_4",
"score": 0.0050201755948364735
},
{
"label": "LABEL_1",
"score": 0.0047601149417459965
},
{
"label": "LABEL_6",
"score": 0.0040282695554196835
}
]
]
User Story
As Cathie Spino, a research professor and director overwhelmed with managing a heavy load of emails daily, I would like an automated way to sort my emails locally on my device before determining which ones are high priority and need immediate attention. This feature should help me distinguish between critical emails requiring action/response and less important ones, enabling me to focus on urgent matters without the need to manually sift through approximately 250 emails every day. By automatically categorizing emails into 'Informational', 'Requires Action/Response', and 'Less Important', I can quickly attend to those that are most pertinent to my roles as a director and principal investigator, while also efficiently managing communications related to university and potential new projects.
Persona: Cathie Spino, a research professor of biostatistics and director of a unit supporting clinical trials, who is challenged by the task of managing a large volume of mixed-priority emails in her professional and personal life.
Feature: Automated sorting of emails into predefined categories ('Informational', 'Requires Action/Response', and 'Less Important') based on content analysis and user-defined rules, with the capability to learn and adapt to Cathie's preferences over time for more personalized management. [ I WILL ELABORATE ON THIS BELOW ]
Business Value: This feature aims to significantly reduce the time Cathie spends on email management daily, ensuring she does not miss out on critical communications and can make more efficient use of her time. By providing a more effective strategy than current manual sorting methods or the basic sorting features of Outlook, this feature addresses Cathie's need for better control over her email inbox, potentially eliminating the need to hire an expensive secretary. Ultimately, it allows Cathie to focus more on her important work and less on email management.