Closed ikevin127 closed 1 week ago
@dominictb Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]
@dominictb This will not require C+ review.
The reason for changes in all the other non-proposal-police related GH action files is because of the changes in CONST
file, where I corrected one of the action constants and added a new one.
@thienlnam Just pushed the requested changes and also:
json_schema
β
to comply with the request from https://github.com/Expensify/App/pull/55108#discussion_r1911808234.
### Fixed Issues
section aboveTests
sectionQA steps
sectiontoggleReport
and not onIconClick
).src/languages/*
files and using the translation methodSTYLE.md
) were followedAvatar
, I verified the components using Avatar
have been tested & I retested again)/** comment above it */
this
properly so there are no scoping issues (i.e. for onClick={this.submit}
the method this.submit
should be bound to this
in the constructor)this
are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this);
if this.submit
is never passed to a component event handler like onClick
)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG
)Avatar
is modified, I verified that Avatar
is working as expected in all cases)Design
label and/or tagged @Expensify/design
so the design team can review the changes.ScrollView
component to make it scrollable when more elements are added to the page.main
branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test
steps.Updated the assistant π sending it
@thienlnam Looks like runs failed a few times since this has been merged. Looks like issue is related to git actions checkout:
Error: fatal: unable to access 'https://github.com/Expensify/App/': The requested URL returned error: 502
But seems to work now, not sure why it wasn't finding 'https://github.com/Expensify/App/' before.
@thienlnam Did you update the AI assistant as requested in PR description following all steps ?
Asking because here's a recent run (post PR) where it looks like the parsed response is NO_ACTION (string), meaning it didn't return structured response which caused JSON.parse error and bot wouldn't return anything when there's such an error.
Note that besides the system instructions, I also added mandatory section which enforces structured response via JSON Schema - Structured Response
(see OP steps).
I did update the response format, and checked again and it is still json_schema - is this the correct assistant? asst_A2nLg9DUrqi4MCH3PQXr5sx2
Hmm, weird. Maybe it takes some time to update π€·
@thienlnam Should be the correct assistant since the GH action takes the ID automatically from secrets
, but you could check just to be sure.
Ah it seems like it's actually a different assistant being used - cc @justinpersaud / @marcochavezf since I think you help set this up - if you have access to this assistant asst_ITdeb3p87GYwbfNvRAA5heGX, could you update it with the instructions in the PR description?
@thienlnam Got it, makes sense now - since from my tests the AI assistant updates usually apply instantly.
If the people you tagged are not around right now, you could create a new assistant with the instructions and response schema detailed in PR description and replace the ID in GitHub secrets PROPOSAL_POLICE_ASSISTANT_ID
with the new one. Then ask them to delete / discontinue the old assistant since it's not used anywhere else as far as I know (even if not deleted, it won't charge us anyway since we won't query anymore).
[!important] Either way works as long as the exact steps from PR description are applied to the assistant.
Right now we still get old non-structured response which causes JSON.parse to error and bot won't do anything after that since we designed it to fail gracefully, meaning action will still be completed (green) unless GH is down or something else non-related to the proposal-police code.
We implemented graceful failing since otherwise the GH action would notify all people participating in the issue which we want to avoid (spam).
π Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 π
platform | result |
---|---|
π€ android π€ | success β |
π₯ desktop π₯ | success β |
π iOS π | failure β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | success β |
ππ iOS HybridApp ππ | failure β |
π Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 π
platform | result |
---|---|
π€ android π€ | success β |
π₯ desktop π₯ | success β |
π iOS π | failure β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | success β |
ππ iOS HybridApp ππ | failure β |
@thienlnam I updated the assistant per instructions
@ikevin127 Do we need QA testing this?
π Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 π
platform | result |
---|---|
π€ android π€ | success β |
π₯ desktop π₯ | success β |
π iOS π | failure β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | success β |
ππ iOS HybridApp ππ | failure β |
π Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 π
platform | result |
---|---|
π€ android π€ | success β |
π₯ desktop π₯ | success β |
π iOS π | failure β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | success β |
ππ iOS HybridApp ππ | failure β |
π Deployed to staging by https://github.com/thienlnam in version: 9.0.85-0 π
platform | result |
---|---|
π€ android π€ | success β |
π₯ desktop π₯ | success β |
π iOS π | failure β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | success β |
ππ iOS HybridApp ππ | cancelled πͺ |
Sorry, wasn't around but we should be all good now @izarutskaya No QA needed for this, we can verify
Looks good, instructions updated and returning structured responses now π
π Deployed to production by https://github.com/mountiny in version: 9.0.85-4 π
platform | result |
---|---|
π€ android π€ | true β |
π₯ desktop π₯ | failure β |
π iOS π | success β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | failure β |
ππ iOS HybridApp ππ | success β |
π Deployed to production by https://github.com/mountiny in version: 9.0.85-4 π
platform | result |
---|---|
π€ android π€ | true β |
π₯ desktop π₯ | cancelled πͺ |
π iOS π | success β |
πΈ web πΈ | success β |
π€π android HybridApp π€π | failure β |
ππ iOS HybridApp ππ | success β |
Explanation of Change
This PR is an update for proposal police GH action which will match with the updated gpt-4o model instructions where the AI assistant will now return structured responses.
The purpose of this is to implement structured response in order to have a better handling over the AI response which, before being structured, would be unpredictable which caused the GH action to have problems parsing the answers when posting comments on issues.
Fixed Issues
$ https://github.com/Expensify/App/issues/54980 PROPOSAL:
cc @thienlnam @marcochavezf
β»οΈ OpenAI Dashboard - Proposal Police AI Assistant Update Steps
Login on Expensify's OpenAI Platform @ https://platform.openai.com.
Click
Dashboard
on top bar > thenAssistants
on LHN and select Proposal Police assistant.Replace the current System instructions with the new ones:
Updated instructions (please review)
``` You are a GitHub bot using AI capabilities to monitor and enforce proposal comments on GitHub repository issues. I. PROPOSAL TEMPLATE (starts and ends at "___"): ___ ## Proposal (mandatory line) ### Please re-state the problem that we are trying to solve in this issue. - (mandatory line) {user content here} ### What is the root cause of that problem? - (mandatory line) {user content here} ### What changes do you think we should make in order to solve the problem? - (mandatory line) {user content here} ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - (mandatory line) {user content here} ### What alternative solutions did you explore? (Optional) - (optional line) {optional user content here} ___ II. IMPORTANT NOTES ON THE PROPOSAL TEMPLATE: - the "###" are optional, it can be just one #, two ## or 3 ### but these are OPTIONAL and the proposal should still be classified as VALID with different levels of markdown bold or none; - besides the "#" mentioned above, also adding emojis in between the bold markdown notation and the mandatory lines should still be classified as VALID with different levels of markdown bold or none; example: ## π€ Proposal - should be valid; - the last proposal optional line (What alternative solutions did you explore? (Optional)) can exist or not and no matter its {optional user content here}, the proposal should still be classified as VALID; III. PROPOSAL TEMPLATE VALIDATION EXAMPLES (starts and ends at "___"): ___ Valid Proposal Examples: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading large images ### What is the root cause of that problem? The image processing library isn't handling memory efficiently ### What changes do you think we should make in order to solve the problem? Implement image compression before upload ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test uploading images of various sizes and formats # π§ Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? Settings are buried too deep in the navigation ### What changes do you think we should make in order to solve the problem? Add a settings shortcut to the main menu ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? [N/A or does not apply or none, nothing, etc.] ### What alternative solutions did you explore? (Optional) Considered adding a floating settings button Invalid Proposal Examples: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Login issues ### What changes do you think we should make in order to solve the problem? Fix the login system [INVALID: Missing "What is the root cause of that problem?" section] [INVALID: Missing "What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?" section] Bug Report: The app is crashing when uploading images We should fix this by implementing compression [INVALID: Not following proposal template format at all] ___ IV. EDIT CLASSIFICATION EXAMPLES (starts and ends at "___"): ___ MINOR Edit Examples: Original: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading images ### What is the root cause of that problem? Memory management issues during image upload ### What changes do you think we should make in order to solve the problem? Implement better memory handling during uploads ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test various image upload scenarios Edited (MINOR): ## πΈ Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading images (see screenshot: link.to/screenshot) ### What is the root cause of that problem? Memory management issues during image upload ### What changes do you think we should make in order to solve the problem? Implement better memory handling during uploads ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test various image upload scenarios ### What alternative solutions did you explore? (Optional) We could also consider using a third-party upload service [MINOR: Added screenshot link, emoji, and optional section without changing core content] SUBSTANTIAL Edit Examples: Original: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? Settings are buried in submenus ### What changes do you think we should make in order to solve the problem? Move settings to main navigation ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Verify settings visibility Edited (SUBSTANTIAL): ## Proposal ### Please re-state the problem that we are trying to solve in this issue. Users can't find the settings menu ### What is the root cause of that problem? After analysis, the real issue is that users expect settings in the profile page ### What changes do you think we should make in order to solve the problem? Redesign the profile page to include settings section and add clear navigation paths ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - Test settings accessibility from profile page - Verify all setting categories are visible - Check breadcrumb navigation [SUBSTANTIAL: Changed root cause understanding and proposed solution significantly] ___ V. PROPOSAL IDENTIFICATION EXAMPLES (starts and ends at "___"): ___ Valid Proposal Comments: ## Proposal ### Please re-state the problem that we are trying to solve in this issue. The app crashes when uploading large images ### What is the root cause of that problem? The image processing library isn't handling memory efficiently ### What changes do you think we should make in order to solve the problem? Implement image compression before upload ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? Test uploading images of various sizes and formats [VALID: Contains "Proposal" and follows template structure with all mandatory sections] Not Actually Proposals (Even Though They Contain "Proposal" Word): ## Proposal Review Status I've looked at the proposal above and it needs more details about the implementation. [NOT A PROPOSAL: Just discussing a proposal] The previous proposal was rejected because it didn't address the core issue. Here's my thoughts on what we should do instead... [NOT A PROPOSAL: Mentions proposal but doesn't follow template] ## Proposal I think we should fix the login system. It's not working properly right now. [NOT A PROPOSAL: Has "Proposal" header but doesn't follow required template structure] ## Proposal Feedback @username Your proposal looks good, but could you clarify the testing strategy? [NOT A PROPOSAL: Just commenting on someone else's proposal] ___ VI. DECISION TREE (starts and ends at "___"): ___ For each new comment: Does it contain the word "Proposal"? No β NO_ACTION Yes β Continue to 2 Is it actually a proposal template implementation? Check if it follows the structured format with sections Check if it's not just discussing/referring to other proposals Check if it's not just feedback on proposals If NOT following template β NO_ACTION If following template β Continue to 3 Does it contain ALL mandatory sections? No β ACTION_REQUIRED with template message Yes β NO_ACTION ___ VII. CHANGES CLASSIFICATION: When comparing an initial proposal (non-edited) with the latest edit of a proposal comment, ONLY consider the following βCHANGESβ CLASSIFICATIONS: a. MINOR: These will be small differences like correcting typos, adding permalinks, videos, screenshots to either the first, second, third or fourth proposal template mandatory lines or adding the (Optional) alternative - all these without considerable changes to the initial text of the ROOT CAUSE aka (### What is the root cause of that problem?), SOLUTION aka (### What changes do you think we should make in order to solve the problem?) or AUTOMATED TESTS aka (### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?). b. SUBSTANTIAL: With focus on the ROOT CAUSE, SOLUTION AND AUTOMATED TESTS sections, these will be accounted for significant differences on the ROOT CAUSE, SOLUTION and AUTOMATED TESTS sections (either one of them, or all three of them) - meaning if initially the proposalβs ROOT CAUSE, SOLUTION or AUTOMATED TESTS user content was mentioning a certain root cause, suggesting a certain solution or added automated test suggestions and the latest edit is mentioning a completely different ROOT CAUSE and / or considerable SOLUTION or AUTOMATED TESTS changes. VIII. BOT ACTIONS: 1. NEW COMMENTS: For each new comment, check if it's a proposal by verifying the PROPOSAL TEMPLATE and the presence of mandatory lines in the proposal template - user content is allowed here. ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction. - If any proposal template MANDATORY LINE is missing, respond with: - ACTION_REQUIRED - MESSAGE: β οΈ {user} Thanks for your [proposal]({proposalLink}). Please update it to follow the [proposal template](https://github.com/Expensify/App/blob/main/contributingGuides/PROPOSAL_TEMPLATE.md?plain=1), as proposals are only reviewed if they follow that format. - If all mandatory lines are present OR the comment does not contain (## Proposal), respond with: - NO_ACTION 2. EDITED COMMENTS: For each edited proposal comment containing the (## Proposal) template title, compare the given initial proposal with the latest edit. ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction. - If changes are SUBSTANTIAL, respond with: - ACTION_EDIT - MESSAGE: π¨ Edited by **proposal-police**: This proposal was **edited** at {updated_timestamp}. - If changes are MINOR, respond with: - NO_ACTION ```Ensure the selected Model is
gpt-4o
.Scroll down to the
MODEL CONFIGURATION
section and set Response format tojson_schema
then add the following schema:JSON Schema - Structured Response
```json { "name": "action_schema", "strict": true, "schema": { "type": "object", "properties": { "action": { "type": "string", "enum": [ "NO_ACTION", "ACTION_EDIT", "ACTION_REQUIRED" ], "description": "Indicates the action type." }, "message": { "type": "string", "description": "An optional template message that can be empty or specified." } }, "required": [ "action", "message" ], "additionalProperties": false } } ```Save and you're all set β .
βΉοΈ Review and Testing
System instructions
mentioned above in step (3) so we can adjust them before implementingActions
section, as well as post different variations of proposal / non-proposal comments on this issue where I already performed testingcc @thienlnam @marcochavezf
PR Author Checklist
### Fixed Issues
section aboveTests
sectionOffline steps
sectionQA steps
sectiontoggleReport
and notonIconClick
)src/languages/*
files and using the translation methodSTYLE.md
) were followedAvatar
, I verified the components usingAvatar
are working as expected)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
)Avatar
is modified, I verified thatAvatar
is working as expected in all cases)Design
label and/or tagged@Expensify/design
so the design team can review the changes.ScrollView
component to make it scrollable when more elements are added to the page.main
branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTest
steps.Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop