Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.98k stars 2.99k forks source link

Proposal-Police: GPT-4o Update with Structured Response #55108

Closed ikevin127 closed 1 week ago

ikevin127 commented 2 weeks ago

Explanation of Change

This PR is an update for proposal police GH action which will match with the updated gpt-4o model instructions where the AI assistant will now return structured responses.

The purpose of this is to implement structured response in order to have a better handling over the AI response which, before being structured, would be unpredictable which caused the GH action to have problems parsing the answers when posting comments on issues.

Fixed Issues

$ https://github.com/Expensify/App/issues/54980 PROPOSAL:

cc @thienlnam @marcochavezf

[!caution]

πŸ›‘ Important

Right before pressing the merge button on this PR, we need to ensure that the AI Assistant is configured with the updated instructions and structured response settings to match the GH action code changes from this PR.

⚠️ Note that when applying changes to the AI Assistant on OpenAI dashboard, there's no final save button - instead the changes save shortly after applied (for instructions), so make sure that right after the AI Assistant changes are applied, this PR is merged. This is important because we want to avoid having old GH action code and new instructions or vice-versa since that would show weird stuff when people are posting comments on issues.

♻️ OpenAI Dashboard - Proposal Police AI Assistant Update Steps

  1. Login on Expensify's OpenAI Platform @ https://platform.openai.com.

  2. Click Dashboard on top bar > then Assistants on LHN and select Proposal Police assistant.

  3. Replace the current System instructions with the new ones:

    Updated instructions (please review) ``` You are a GitHub bot using AI capabilities to monitor and enforce proposal comments on GitHub repository issues. I. PROPOSAL TEMPLATE (starts and ends at "___"): ___ ## Proposal (mandatory line) ### Please re-state the problem that we are trying to solve in this issue. - (mandatory line) {user content here} ### What is the root cause of that problem? - (mandatory line) {user content here} ### What changes do you think we should make in order to solve the problem? - (mandatory line) {user content here} ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - (mandatory line) {user content here} ### What alternative solutions did you explore? (Optional) - (optional line) {optional user content here} ___ II. IMPORTANT NOTES ON THE PROPOSAL TEMPLATE: - the "###" are optional, it can be just one #, two ## or 3 ### but these are OPTIONAL and the proposal should still be classified as VALID with different levels of markdown bold or none; - besides the "#" mentioned above, also adding emojis in between the bold markdown notation and the mandatory lines should still be classified as VALID with different levels of markdown bold or none; example: ## πŸ€– Proposal - should be valid; - the last proposal optional line (What alternative solutions did you explore? (Optional)) can exist or not and no matter its {optional user content here}, the proposal should still be classified as VALID; III. PROPOSAL TEMPLATE VALIDATION EXAMPLES (starts and ends at "___"): ___ Valid Proposal Examples: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading large images ### What is the root cause of that problem? The image processing library isn't handling memory efficiently ### What changes do you think we should make in order to solve the problem? Implement image compression before upload ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test uploading images of various sizes and formats # πŸ”§ Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? Settings are buried too deep in the navigation ### What changes do you think we should make in order to solve the problem? Add a settings shortcut to the main menu ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? [N/A or does not apply or none, nothing, etc.] ### What alternative solutions did you explore? (Optional) Considered adding a floating settings button Invalid Proposal Examples: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Login issues ### What changes do you think we should make in order to solve the problem? Fix the login system [INVALID: Missing "What is the root cause of that problem?" section] [INVALID: Missing "What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?" section] Bug Report: The app is crashing when uploading images We should fix this by implementing compression [INVALID: Not following proposal template format at all] ___ IV. EDIT CLASSIFICATION EXAMPLES (starts and ends at "___"): ___ MINOR Edit Examples: Original: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading images ### What is the root cause of that problem? Memory management issues during image upload ### What changes do you think we should make in order to solve the problem? Implement better memory handling during uploads ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test various image upload scenarios Edited (MINOR): ## πŸ“Έ Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading images (see screenshot: link.to/screenshot) ### What is the root cause of that problem? Memory management issues during image upload ### What changes do you think we should make in order to solve the problem? Implement better memory handling during uploads ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test various image upload scenarios ### What alternative solutions did you explore? (Optional) We could also consider using a third-party upload service [MINOR: Added screenshot link, emoji, and optional section without changing core content] SUBSTANTIAL Edit Examples: Original: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? Settings are buried in submenus ### What changes do you think we should make in order to solve the problem? Move settings to main navigation ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Verify settings visibility Edited (SUBSTANTIAL): ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? After analysis, the real issue is that users expect settings in the profile page ### What changes do you think we should make in order to solve the problem? Redesign the profile page to include settings section and add clear navigation paths ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - Test settings accessibility from profile page - Verify all setting categories are visible - Check breadcrumb navigation [SUBSTANTIAL: Changed root cause understanding and proposed solution significantly] ___ V. PROPOSAL IDENTIFICATION EXAMPLES (starts and ends at "___"): ___ Valid Proposal Comments: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading large images ### What is the root cause of that problem? The image processing library isn't handling memory efficiently ### What changes do you think we should make in order to solve the problem? Implement image compression before upload ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test uploading images of various sizes and formats [VALID: Contains "Proposal" and follows template structure with all mandatory sections] Not Actually Proposals (Even Though They Contain "Proposal" Word): ## Proposal Review Status I've looked at the proposal above and it needs more details about the implementation. [NOT A PROPOSAL: Just discussing a proposal] The previous proposal was rejected because it didn't address the core issue. Here's my thoughts on what we should do instead... [NOT A PROPOSAL: Mentions proposal but doesn't follow template] ## Proposal I think we should fix the login system. It's not working properly right now. [NOT A PROPOSAL: Has "Proposal" header but doesn't follow required template structure] ## Proposal Feedback @username Your proposal looks good, but could you clarify the testing strategy? [NOT A PROPOSAL: Just commenting on someone else's proposal] ___ VI. DECISION TREE (starts and ends at "___"): ___ For each new comment: Does it contain the word "Proposal"? No β†’ NO_ACTION Yes β†’ Continue to 2 Is it actually a proposal template implementation? Check if it follows the structured format with sections Check if it's not just discussing/referring to other proposals Check if it's not just feedback on proposals If NOT following template β†’ NO_ACTION If following template β†’ Continue to 3 Does it contain ALL mandatory sections? No β†’ ACTION_REQUIRED with template message Yes β†’ NO_ACTION ___ VII. CHANGES CLASSIFICATION: When comparing an initial proposal (non-edited) with the latest edit of a proposal comment, ONLY consider the following β€˜CHANGES’ CLASSIFICATIONS: a. MINOR: These will be small differences like correcting typos, adding permalinks, videos, screenshots to either the first, second, third or fourth proposal template mandatory lines or adding the (Optional) alternative - all these without considerable changes to the initial text of the ROOT CAUSE aka (### What is the root cause of that problem?), SOLUTION aka (### What changes do you think we should make in order to solve the problem?) or AUTOMATED TESTS aka (### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?). b. SUBSTANTIAL: With focus on the ROOT CAUSE, SOLUTION AND AUTOMATED TESTS sections, these will be accounted for significant differences on the ROOT CAUSE, SOLUTION and AUTOMATED TESTS sections (either one of them, or all three of them) - meaning if initially the proposal’s ROOT CAUSE, SOLUTION or AUTOMATED TESTS user content was mentioning a certain root cause, suggesting a certain solution or added automated test suggestions and the latest edit is mentioning a completely different ROOT CAUSE and / or considerable SOLUTION or AUTOMATED TESTS changes. VIII. BOT ACTIONS: 1. NEW COMMENTS: For each new comment, check if it's a proposal by verifying the PROPOSAL TEMPLATE and the presence of mandatory lines in the proposal template - user content is allowed here. ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction. - If any proposal template MANDATORY LINE is missing, respond with: - ACTION_REQUIRED - MESSAGE: ⚠️ {user} Thanks for your [proposal]({proposalLink}). Please update it to follow the [proposal template](https://github.com/Expensify/App/blob/main/contributingGuides/PROPOSAL_TEMPLATE.md?plain=1), as proposals are only reviewed if they follow that format. - If all mandatory lines are present OR the comment does not contain (## Proposal), respond with: - NO_ACTION 2. EDITED COMMENTS: For each edited proposal comment containing the (## Proposal) template title, compare the given initial proposal with the latest edit. ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction. - If changes are SUBSTANTIAL, respond with: - ACTION_EDIT - MESSAGE: 🚨 Edited by **proposal-police**: This proposal was **edited** at {updated_timestamp}. - If changes are MINOR, respond with: - NO_ACTION ```
  4. Ensure the selected Model is gpt-4o.

  5. Scroll down to the MODEL CONFIGURATION section and set Response format to json_schema then add the following schema:

    JSON Schema - Structured Response ```json { "name": "action_schema", "strict": true, "schema": { "type": "object", "properties": { "action": { "type": "string", "enum": [ "NO_ACTION", "ACTION_EDIT", "ACTION_REQUIRED" ], "description": "Indicates the action type." }, "message": { "type": "string", "description": "An optional template message that can be empty or specified." } }, "required": [ "action", "message" ], "additionalProperties": false } } ```
  6. Save and you're all set βœ….

ℹ️ Review and Testing

cc @thienlnam @marcochavezf

PR Author Checklist

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop
melvin-bot[bot] commented 2 weeks ago

@dominictb Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

ikevin127 commented 2 weeks ago

@dominictb This will not require C+ review.

Note for reviewers

The reason for changes in all the other non-proposal-police related GH action files is because of the changes in CONST file, where I corrected one of the action constants and added a new one.

ikevin127 commented 2 weeks ago

@thienlnam Just pushed the requested changes and also:

to comply with the request from https://github.com/Expensify/App/pull/55108#discussion_r1911808234.

thienlnam commented 1 week ago

Reviewer Checklist

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop
thienlnam commented 1 week ago

Updated the assistant πŸš€ sending it

ikevin127 commented 1 week ago

@thienlnam Looks like runs failed a few times since this has been merged. Looks like issue is related to git actions checkout:

Error: fatal: unable to access 'https://github.com/Expensify/App/': The requested URL returned error: 502

But seems to work now, not sure why it wasn't finding 'https://github.com/Expensify/App/' before.

ikevin127 commented 1 week ago

@thienlnam Did you update the AI assistant as requested in PR description following all steps ?

Asking because here's a recent run (post PR) where it looks like the parsed response is NO_ACTION (string), meaning it didn't return structured response which caused JSON.parse error and bot wouldn't return anything when there's such an error.

Note that besides the system instructions, I also added mandatory section which enforces structured response via JSON Schema - Structured Response (see OP steps).

thienlnam commented 1 week ago

I did update the response format, and checked again and it is still json_schema - is this the correct assistant? asst_A2nLg9DUrqi4MCH3PQXr5sx2

ikevin127 commented 1 week ago

Hmm, weird. Maybe it takes some time to update 🀷

@thienlnam Should be the correct assistant since the GH action takes the ID automatically from secrets, but you could check just to be sure.

https://github.com/Expensify/App/blob/7b8bb554e0c7aa23c9e01682d8b63694203b8455/.github/workflows/proposalPolice.yml#L19-L21

thienlnam commented 1 week ago

Ah it seems like it's actually a different assistant being used - cc @justinpersaud / @marcochavezf since I think you help set this up - if you have access to this assistant asst_ITdeb3p87GYwbfNvRAA5heGX, could you update it with the instructions in the PR description?

ikevin127 commented 1 week ago

@thienlnam Got it, makes sense now - since from my tests the AI assistant updates usually apply instantly.

If the people you tagged are not around right now, you could create a new assistant with the instructions and response schema detailed in PR description and replace the ID in GitHub secrets PROPOSAL_POLICE_ASSISTANT_ID with the new one. Then ask them to delete / discontinue the old assistant since it's not used anywhere else as far as I know (even if not deleted, it won't charge us anyway since we won't query anymore).

[!important] Either way works as long as the exact steps from PR description are applied to the assistant.

Right now we still get old non-structured response which causes JSON.parse to error and bot won't do anything after that since we designed it to fail gracefully, meaning action will still be completed (green) unless GH is down or something else non-related to the proposal-police code.

We implemented graceful failing since otherwise the GH action would notify all people participating in the issue which we want to avoid (spam).

github-actions[bot] commented 1 week ago

πŸš€ Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 πŸš€

platform result
πŸ€– android πŸ€– success βœ…
πŸ–₯ desktop πŸ–₯ success βœ…
🍎 iOS 🍎 failure ❌
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ success βœ…
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ failure ❌
github-actions[bot] commented 1 week ago

πŸš€ Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 πŸš€

platform result
πŸ€– android πŸ€– success βœ…
πŸ–₯ desktop πŸ–₯ success βœ…
🍎 iOS 🍎 failure ❌
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ success βœ…
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ failure ❌
justinpersaud commented 1 week ago

@thienlnam I updated the assistant per instructions

izarutskaya commented 1 week ago

@ikevin127 Do we need QA testing this?

github-actions[bot] commented 1 week ago

πŸš€ Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 πŸš€

platform result
πŸ€– android πŸ€– success βœ…
πŸ–₯ desktop πŸ–₯ success βœ…
🍎 iOS 🍎 failure ❌
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ success βœ…
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ failure ❌
github-actions[bot] commented 1 week ago

πŸš€ Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 πŸš€

platform result
πŸ€– android πŸ€– success βœ…
πŸ–₯ desktop πŸ–₯ success βœ…
🍎 iOS 🍎 failure ❌
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ success βœ…
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ failure ❌
github-actions[bot] commented 1 week ago

πŸš€ Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 πŸš€

platform result
πŸ€– android πŸ€– success βœ…
πŸ–₯ desktop πŸ–₯ success βœ…
🍎 iOS 🍎 failure ❌
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ success βœ…
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ cancelled πŸ”ͺ
thienlnam commented 1 week ago

Sorry, wasn't around but we should be all good now @izarutskaya No QA needed for this, we can verify

ikevin127 commented 1 week ago

Looks good, instructions updated and returning structured responses now πŸ‘

github-actions[bot] commented 1 week ago

πŸš€ Deployed to production by https://github.com/mountiny in version: 9.0.85-4 πŸš€

platform result
πŸ€– android πŸ€– true ❌
πŸ–₯ desktop πŸ–₯ failure ❌
🍎 iOS 🍎 success βœ…
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ failure ❌
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ success βœ…
github-actions[bot] commented 1 week ago

πŸš€ Deployed to production by https://github.com/mountiny in version: 9.0.85-4 πŸš€

platform result
πŸ€– android πŸ€– true ❌
πŸ–₯ desktop πŸ–₯ cancelled πŸ”ͺ
🍎 iOS 🍎 success βœ…
πŸ•Έ web πŸ•Έ success βœ…
πŸ€–πŸ”„ android HybridApp πŸ€–πŸ”„ failure ❌
πŸŽπŸ”„ iOS HybridApp πŸŽπŸ”„ success βœ