COSC481W-2024Winter / JARVIS

J.A.R.V.I.S
2 stars 3 forks source link

PBI 1: Sort Emails #11

Closed ghost closed 9 months ago

ghost commented 10 months ago

User Story

As Cathie Spino, a research professor and director overwhelmed with managing a heavy load of emails daily, I would like an automated way to sort my emails locally on my device before determining which ones are high priority and need immediate attention. This feature should help me distinguish between critical emails requiring action/response and less important ones, enabling me to focus on urgent matters without the need to manually sift through approximately 250 emails every day. By automatically categorizing emails into 'Informational', 'Requires Action/Response', and 'Less Important', I can quickly attend to those that are most pertinent to my roles as a director and principal investigator, while also efficiently managing communications related to university and potential new projects.

ghost commented 9 months ago

TASKS: This first sprint we will focus on:

TIMELINE: Timeline for 3 tasks:

Acceptance Criteria: Does it return a category? Does it return results consistent with testing parameters (given some level of accuracy on our test set, I should look for an accuracy of more than 50% on the first model.)

ghost commented 9 months ago

This is the associated ML repo JARVIS Email Sorter Training

ghost commented 9 months ago

This the new branch name. It seems I can't update it under the development tab.

emarron/sort-emails

ghost commented 9 months ago

Due to computational requirements I will expand on "Put in ML sorter, depending on computational cost, either a server connection or locally on the device: By 2/19" I have opted for server connection. I will make another associated JARVIS server repo, and the server side code will live there. Basically this works by having an API endpoint on the server and and API endpoint in the app.

ghost commented 9 months ago

Here is the link to the server code JARVIS-server

ghost commented 9 months ago

I have completely implemented this, but since the environmental variables security feature is not merged in yet. waiting confirmation from @haohuazheng3 , I have stored the key in a secure location. I can verify the API connection works via tests when the API-key is filled out.

ghost commented 9 months ago

noted two issues 1: you can't send so many requests to hugging faces within an hour. <100. 2.There is a token limit to distilbert (512) so inputs can only be 512 words. 2a. A 'cheap' way to do this would be to truncate all results to 512 at max, 2b. split longer emails into pieces that don't exceed 512, then average the results to get the correct category. For simply having something by deadline I will do 2a, and make the tests on 10 emails rather than 100. In the future I can toss 9 dollars to hugging faces to have more API calls.

ghost commented 9 months ago

This PBI has been merged into main. There are some modifications that should be made in the future regarding 1> more sophisticated handling of large emails (>512 words) 2> further refinement of the sorting model 3> better responses when hitting API 503 error, (too many API calls within 1 hour). 4> integration with "Collect Emails" and "Summary" 5> implement encryption and decryption of emails. So encrypt before we send email, email gets decrypted at the server. We don't need to worry about encrypting and decrypting what comes FROM the server because it only sends back which categories are most likely.

e.g.: server response

[
    [
        {
            "label": "LABEL_0",
            "score": 0.9175021052360535
        },
        {
            "label": "LABEL_5",
            "score": 0.03506850823760033
        },
        {
            "label": "LABEL_3",
            "score": 0.0159841850399971
        },
        {
            "label": "LABEL_2",
            "score": 0.01139953825622797
        },
        {
            "label": "LABEL_7",
            "score": 0.006237142253667116
        },
        {
            "label": "LABEL_4",
            "score": 0.0050201755948364735
        },
        {
            "label": "LABEL_1",
            "score": 0.0047601149417459965
        },
        {
            "label": "LABEL_6",
            "score": 0.0040282695554196835
        }
    ]
]