PBI 1: Sort Emails - Githubissues

COSC481W-2024Winter / JARVIS

J.A.R.V.I.S

2 stars 3 forks source link

PBI 1: Sort Emails #11

Closed ghost closed 9 months ago

ghost commented 10 months ago

User Story

As Cathie Spino, a research professor and director overwhelmed with managing a heavy load of emails daily, I would like an automated way to sort my emails locally on my device before determining which ones are high priority and need immediate attention. This feature should help me distinguish between critical emails requiring action/response and less important ones, enabling me to focus on urgent matters without the need to manually sift through approximately 250 emails every day. By automatically categorizing emails into 'Informational', 'Requires Action/Response', and 'Less Important', I can quickly attend to those that are most pertinent to my roles as a director and principal investigator, while also efficiently managing communications related to university and potential new projects.

Persona: Cathie Spino, a research professor of biostatistics and director of a unit supporting clinical trials, who is challenged by the task of managing a large volume of mixed-priority emails in her professional and personal life.
Feature: Automated sorting of emails into predefined categories ('Informational', 'Requires Action/Response', and 'Less Important') based on content analysis and user-defined rules, with the capability to learn and adapt to Cathie's preferences over time for more personalized management. [ I WILL ELABORATE ON THIS BELOW ]
Business Value: This feature aims to significantly reduce the time Cathie spends on email management daily, ensuring she does not miss out on critical communications and can make more efficient use of her time. By providing a more effective strategy than current manual sorting methods or the basic sorting features of Outlook, this feature addresses Cathie's need for better control over her email inbox, potentially eliminating the need to hire an expensive secretary. Ultimately, it allows Cathie to focus more on her important work and less on email management.

ghost commented 9 months ago

TASKS: This first sprint we will focus on:

[x] Dummy model that spits out Categories: 1. 'Company Business/Strategy' 2. 'Purely Personal' 3. 'Personal but in a professional context' 4. 'Logistic Arrangements' 5. 'Employment arrangements' 6. 'Document editing/checking/collaboration' 7. 'Empty message (due to missing attachment)' randomly for each email. This will be a stub for others to work on. They can SEND TO and RECIEVE FROM.
[x] Create separate repo under JARVIS that handles the training of our email sorter. We will begin crafting a 'good' sorter based on the enron emails. Here
[x] Place first edition of sorter in. depending on power draw of sorter it will either be a remote source or local to the device. If it is remote, we may need to expand out into further PBI.

TIMELINE: Timeline for 3 tasks:

[x] Set up code to facilitate the 7 categories with a dummy random sort: By 2/10
[x] Set up tests: By 2/11
[x] Set up ML repo : By 2/12
[x] Finish first edition of ML model: By 2/16
[x] Put in ML sorter, depending on computational cost, either a server connection or locally on the device: By 2/19

Acceptance Criteria: Does it return a category? Does it return results consistent with testing parameters (given some level of accuracy on our test set, I should look for an accuracy of more than 50% on the first model.)

ghost commented 9 months ago

This is the associated ML repo JARVIS Email Sorter Training

ghost commented 9 months ago

This the new branch name. It seems I can't update it under the development tab.

emarron/sort-emails

ghost commented 9 months ago

Due to computational requirements I will expand on "Put in ML sorter, depending on computational cost, either a server connection or locally on the device: By 2/19" I have opted for server connection. I will make another associated JARVIS server repo, and the server side code will live there. Basically this works by having an API endpoint on the server and and API endpoint in the app.

ghost commented 9 months ago

Here is the link to the server code JARVIS-server

ghost commented 9 months ago

I have completely implemented this, but since the environmental variables security feature is not merged in yet. waiting confirmation from @haohuazheng3 , I have stored the key in a secure location. I can verify the API connection works via tests when the API-key is filled out.

ghost commented 9 months ago

noted two issues 1: you can't send so many requests to hugging faces within an hour. <100. 2.There is a token limit to distilbert (512) so inputs can only be 512 words. 2a. A 'cheap' way to do this would be to truncate all results to 512 at max, 2b. split longer emails into pieces that don't exceed 512, then average the results to get the correct category. For simply having something by deadline I will do 2a, and make the tests on 10 emails rather than 100. In the future I can toss 9 dollars to hugging faces to have more API calls.

ghost commented 9 months ago

This PBI has been merged into main. There are some modifications that should be made in the future regarding 1> more sophisticated handling of large emails (>512 words) 2> further refinement of the sorting model 3> better responses when hitting API 503 error, (too many API calls within 1 hour). 4> integration with "Collect Emails" and "Summary" 5> implement encryption and decryption of emails. So encrypt before we send email, email gets decrypted at the server. We don't need to worry about encrypting and decrypting what comes FROM the server because it only sends back which categories are most likely.

e.g.: server response

[
    [
        {
            "label": "LABEL_0",
            "score": 0.9175021052360535
        },
        {
            "label": "LABEL_5",
            "score": 0.03506850823760033
        },
        {
            "label": "LABEL_3",
            "score": 0.0159841850399971
        },
        {
            "label": "LABEL_2",
            "score": 0.01139953825622797
        },
        {
            "label": "LABEL_7",
            "score": 0.006237142253667116
        },
        {
            "label": "LABEL_4",
            "score": 0.0050201755948364735
        },
        {
            "label": "LABEL_1",
            "score": 0.0047601149417459965
        },
        {
            "label": "LABEL_6",
            "score": 0.0040282695554196835
        }
    ]
]