Closed roryabraham closed 1 month ago
Current assignee @mallenexpensify is eligible for the NewFeature assigner, not assigning anyone new.
Credit to @ikevin127 for the proposal. Comment for assignment 😉
Happy to assist with the review, co-assigning myself
Comment ? Cheers!
Edited by proposal-police: This proposal was added at 2024-01-24 13:15:24 UTC
@roryabraham @marcochavezf what are you thoughts about starting with an MVP that adds the above to a comment that's been edited? Then, we (all) can review how it's working then consider additional improvements.
@mallenexpensify are you basically suggesting that for the MVP we remove this piece?
Enforce the use of the proposal template by scanning new issue comments. If a proposal does not follow the template, the bot will comment, tagging the submitter and providing a link to the proposal template to encourage standardization
I don't want the edited by proposal-police to appear above any comment that's edited because that will cause false positives and discourage people from fixing typos or adding minor clarifications. That's where ChatGPT comes in to help enforce Rule #2 - only substantive edits need to be called out.
(POC) Bot source: https://github.com/ikevin127/proposal-police
That's where ChatGPT comes in to help enforce Rule https://github.com/Expensify/App/pull/2 - only substantive edits need to be called out.
For reference these are the current (POC) assistant instructions:
Feel free to take apart the paragraphs where needed in order to change the AI's behaviour / response.
Currently on the AI side we return templates of either "NO_ACTION" (when proposal template was followed or edit changes are minor) or cases like "{user} Your proposal will be dismissed because you did not follow the proposal template." that include variables which we replace within the bot's code, variables we get from the github context depending on the triggered webhook (new comment, edited comment, etc).
I don't want the edited by proposal-police to appear above any comment that's edited because that will cause false positives and discourage people from fixing typos or adding minor clarifications
How does Chat GPT inclusion help keep the edited by proposal-police from being above the comment? I might me missing or not understanding a step or detail.
Hey y'all 👋
I created the expensify-proposal-testing repo for testing purposes.
Sent you all collaborator invitations to the repo so you can create issues and play around. Already created an issue here where I posted a few comments to test it out - feel free to use it or just create your own 👈
Note: make sure to add the Help Wanted label to new issues, otherwise the proposal-police won't act.
Currently we're using this format Edited by proposal-police: This proposal was added at 2024-01-24 13:15:24 UTC
for when a proposal was edited substantially.
Thanks @ikevin127 ,
Edited by proposal-police: This proposal was added at 2024-01-29 12:58:07 UTC.
Does it make sense to have the above read
Edited by proposal-police: This proposal was edited at 2024-01-29 12:58:07 UTC.
The main problem we're wanting to solve is making the timestamp for edits easily noticeable for everyone, right?
True - for those proposals where the previous vs current edit has substantial changes in the RCA and solution sections 🫣
I updated the bot according to https://github.com/Expensify/App/issues/35108#issuecomment-1917948422 requirements 🚀
This is how it looks after the update - basically when the proposal-police edits the comment, it will mention the updated_at as timestamp of when the last significant edit was posted by the Contributor.
cc @mallenexpensify
Tested here but I suppose the GH bot is not running, correct? Also, what do you suggest as the next steps to implement it in the App repo?
@marcochavezf Hey 👋
It is running, has been for the past few days - what happened (I checked the logs) is that with all your 3 comments the gpt assistant responded with NO_ACTION 😅
You can check out some of my past comments on that issue and try different scenarios like:
Obviously for those you tried and there was no action taken we'll need more fine-tunning if we all agree that it should've done something😁
As for the next steps to implement it within the App repo:
[!NOTE] To be taken with a grain of🧂 Feel free to add / change / subtract ♻️
Hi @ikevin127, apologies for the late reply. The plan sounds good, thanks! A few thoughts:
create the assistant on Expensify owned OpenAI account
I think we can do it programmatically and just pass the API key as ENV param from the GH action, no?
decide which bot capabilities we want to go with on first release: only go with the 'wrong proposal template check' or also include the 'edit post in case of substantial changes' even though it might require some extra fine-tunning
It would be great to go with both. What would you need to fine-tune the solution?
decide the conditions in which the bot should run, like for example: if issue is open and has Help Wanted label or any other
I think when the label Help Wanted
is added and when someone post in the GH issue
agree on initial testing period, making sure we have a plan in case things go south like disabling the bot
One idea is to add a label Beta ProposalPolice
where the bot can only run if also that label is added.
@marcochavezf No worries 😁 Even though this is a Weekly issue, I wasn't expecting a reply every week.
I think we can do it programmatically and just pass the API key as ENV param from the GH action, no?
Unfortunately, OpenAI only allows using assistants within the same org (account) therefore we will need to:
pass these 2 to the GH bot via env variables and we're done w/ the setup
It would be great to go with both. What would you need to fine-tune the solution?
Both sounds good then 🎉
About fine-tunning: I was talking more in terms of general small adjustments of the assistant's instructions regarding specifics of what we consider minor / major proposal edits and so on such that the bot triggers the right action / response.
But since we can create a new label like Beta ProposalPolice
this is more than enough to safely fine-tune the assistant while in beta.
One idea is to add a label
Beta ProposalPolice
where the bot can only run if also that label is added.
I think this would be great, this way we can safely test it out and fine-tune it live!
This being said, once we have the setup completed, installed the bot in the E/App repo and added the new Beta ProposalPolice
- we're good to go! 🚀
Sounds good, thanks @ikevin127! Besides the assistantID and the OpenAI API key, do you think you can create a PR with the bot? Or would we need to set up something else first?
@marcochavezf Sure, here's the proposal-police repo, it's public. How do you think I should use the existing repo to create a PR and where ?
@ikevin127 can you take a look at these two issues to see if Proposal Police would have been helpful?
Sure, I have an idea as to how to test proposal-police on the mentioned issues.
Not entirely sure proposal-police would actually help with the conflict, but instead just note if any proposal had significant changes compared to initial / previous edits.
From there, the C+ would still have to actually review the proposals accounting for proposal-police's notes on proposal edit timelines changes severity - then take it from there.
My idea would be to create 2 identical mock issues here and simply post both proposals and their updates in chronological order and see how the proposal-police will respond.
I'll get back with results in 1-2 hours, excited to see the outcome 🤓
Here are the [MOCK] issues created in the repo where proposal-police is active: [MOCK] BA - On the phone number page, user can not enter data as the placeholder shows [MOCK] Web - Emoji picker - Holding down on scroll bar and sliding down selects emoji instead of scrolling
In chronological order following original issue's timelines mentioned in https://github.com/Expensify/App/issues/35108#issuecomment-1983671726, here's the breakdown of proposal-police actions:
Edited by proposal-police: This proposal was edited at 2024-03-07 16:31:55 UTC.
Edited by proposal-police: This proposal was edited at 2024-03-07 16:41:05 UTC.
So for issue (1) proposal-police intervened due to significant proposal changes while for the issue (2) no intervention.
cc @mallenexpensify
Thanks @ikevin127 , happy to experience in action. For the first mockup, I'm trying to think what the process would be once Edited by proposal-police: This proposal was edited at 2024-03-07 16:31:55 UTC.
is surfaced. Maybe it's just C+ denotes and, if there are any potential discrepancies between two proposals, they dive in to get more details.
Another option would be to tag the contributor who made the changes and ask them to include all the details that they changed.
For BZ's, as non-engineers, it's often really tough for them to know what code is/isn't important.
Sounds like a good procedure:
We can for sure work on ideas when it comes to (2.) as to how we want to go about it.
For BZ's, as non-engineers, it's often really tough for them to know what code is/isn't important.
➕1️⃣
Even the AI's minor / significant changes detection can be subjectively interpreted by the C / C+, as per the [MOCK] examples from https://github.com/Expensify/App/issues/35108#issuecomment-1983799697, some contributors might argue that their changes were not significant - but since this is handled by the AI guess nobody can argue that it was unfair only for them.
Moreee training data for ProposalPolice - https://github.com/Expensify/App/issues/37254#issuecomment-1984940555
I human, cannot decide what to do here.
Moreee training data for ProposalPolice
[!note] These [MOCK] issues don't actually help train the AI as the assistant is acting based on these instructions. The training is already done, the model used being OpenAI's GPT-4-Turbo.
I created a [MOCK] issue with Krishna2323's and dukenv0307's proposals and their edits in chronological order, check it out: [MOCK] Workspace - If title is changed offline, the description is no longer greyed out
[!important] I started with Krishna2323's proposal edit from 4 days ago (RIGHT BEFORE) dukenv0307's initial proposal since those are the relevant parts here as this is a conflict between the 2 contributors and their edit timelines.
To sum up the results in chronological order according to original issue:
Edited by proposal-police: This proposal was edited at 4 days ago (edit).
Thanks for all the help here @ikevin127 , it does make me think that there might be a real problem here with contributors quick-posting, other contributors also posting quickly then multiple updates being made that cloud the judgement of the C+ on who best to hire.
[!tip] The expensify-proposal-testing repository is
public
and if in need of a second opinion from the proposal-police you can always [MOCK] issues yourself in the issues section - until we actually incorporate the bot within E/App.
@rushatgabhane @mallenexpensify 👋 Sorry guys, I really don't have the capacity to create [MOCK] issues right now - as you can imagine it takes quite some time to carefully copy / paste proposals in chronological order and come back here with a comment that summarizes the results.
[!note] Notes:
- I already mocked 3 issues so you have a model of how to do it
- make sure to add the
Help Wanted
label after you create the issue (otherwise proposal-police will skip it)- proposal-police usually responds in 3-7 seconds after a comment was posted / edited
@ikevin127 sorry I didn't know that. i thought you just have to feed in the issue number and it does the work 😅
Same, apologies @ikevin127 , wasn't intending to add a bunch of work.
until we actually incorporate the bot within E/App.
Is there a way we can incorporate a v1 where it doesn't auto-post to each issue ? ie. maybe it's a hidden comment or a quick comment where the results are hidden behind a ▶️ like we do in the OP of issues?
Or... a way to create a tool where we can paste in the link to an issue then have it spit out the results?
Thx
Same, apologies @ikevin127 , wasn't intending to add a bunch of work.
No worries! 🙂
Is there a way we can incorporate a v1 where it doesn't auto-post to each issue ?
Actually @marcochavezf came up with a great idea here where we would be adding a new label within E/App something like Beta ProposalPolice
, this would allow us to only test proposal-police on selected issues, given we assign the label soon after the issue is created (ideally before proposals start coming in).
That could be one way to move towards slowly incorporating proposal-police within E/App.
Or... a way to create a tool where we can paste in the link to an issue then have it spit out the results?
I will be looking into this when I get some time, will use AI to see if I can come up with a script like:
only test proposal-police on selected issues, given we assign the label soon after the issue is created (ideally before proposals start coming in).
My concern here is that we don't know ahead of time which issues are the ones we want to check. It's most often that we encounter issues, like above, then we want to review the data. I do like the idea of v1 being either unobtrusive or behind the scenes, I'd like to test out everythign then create processes and docs so we can let internal and external folks know.
that helps to enforce the documented best practices for proposals, and aids the reviewers in maintaining a clearer timeline of proposals.
I'm not totally sure if we are clear in the documentation about what we consider best practices. In other words, I think that what we observe (e.g. speedrun proposals) is intuitively bad, but not obviously breaking the Contributing Guidelines (maybe only in the aspect of missing "Proposal Updated" disclaimers).
I think we could first update the Contributing Guidelines to make it clear that the behaviors we intuitively consider wrong are documented to be disallowed or discouraged.
I created a plan on Slack regarding this.
we would be adding a new label within E/App something like Beta ProposalPolice, this would allow us to only test proposal-police on selected issues
I'd vote to keep it simple and just turn it on everywhere and see how it goes. It seems like the downsides of it not working perfectly are not that bad
Hi @ikevin127! There's more and more enthusiasm for ProposalPolice, which is awesome! While I was looking at the mock issues, I think eventually it would be great to fine-tune an OpenAI model to have more accurate results, but that would be for a v2
For now, I'm trying to wrap my head around what we need to do to integrate the current ProposalPolice into the E/App repo. I understand this is a GH App built in Probot (node.js), so it seems we'd need to host the app somewhere, run it, and then install it into the E/App report (create a workflow in .github
), correct? (Sorry I'm not familiar with GH Apps).
I think eventually it would be great to fine-tune an OpenAI model to have more accurate results, but that would be for a v2
For sure there's room for improvement in terms of assistant instructions in order to get better results.
I understand this is a GH App built in Probot (node.js), so it seems we'd need to host the app somewhere, run it, and then install it into the E/App report (create a workflow in .github), correct? (Sorry I'm not familiar with GH Apps).
You got it right when it comes to the GH app!
Indeed I used Probot (node.js) to build the bot which needs to be hosted and ran somewhere.
Once the app is running, the GH app (bot) can be added to any repository and it will work whenever comments are posted within one of that repo's issue, given that the issue has the Help Wanted
label.
(create a workflow in .github)
Not sure about this part, the way I use the bot is simply adding it to a repository (as described above).
[!important] There's a step before deploying the node.js bot app and that is setting up the bot's AI assistant on Expensify's OpenAI account because currently the bot is charging my personal OpenAI account whenever the AI assistant is prompted by the bot.
To do this:
- Login into Expensify's OpenAI account.
- Go to Assistants > [+ Create].
- Name the assistant then paste the latest assistant Instructions.
- Select the
gpt-4-1106-preview
Model and Create the Assistant.
Once the AI assistant is created we can populate the bot's OPENAI_ASSISTANT_ID
and OPENAI_API_KEY
env variables, and given all other variables are correctly populated and the bot is running - it can be added to any repository and it will work as described.
cc @marcochavezf
Removing NewFeature
because we're adding all bugs and feature requests to VIP and Wave projects and this doesn't directly align with any, it helps all.
Hi @ikevin127, we've been discussing the options to host and install the app, and currently, we can't host, run, and maintain a Probot (node.js) app atm. Is there a possibility of migrating ProposalPolicy to a GH action (since that's what we currently use for different workflows)?
@marcochavezf Yes, that should work!
I'll look into it and come back with something by EOW, possibly something that works exactly like the app but directly from GH actions.
I managed to port the bot to GH actions! ♻️
Here's the workflow script that triggers based on issue comment [created, edited] actions -> calling the scripts JS functions to pool the OpenAI assistant API, exactly like the Probot app did.
Moving this workflow to E/App should be pretty straight-forward:
OPENAI_API_KEY
and OPENAI_ASSISTANT_ID
labelNames
argument from current ['Help Wanted']
to whatever other labels, the action will only run if the issue has all labels mentioned in the array:
@param {*} labelNames - String array of label names to check for, ex. ['Help Wanted', "External"]
removing the Planning
label since this is done planning and moving into implementation / deployment
Nice! Thank you, looks great, I will request to add the secrets as env variables for the repository. Meanwhile, can you create a PR with the changes to the App repo?
Meanwhile, can you create a PR with the changes to the App repo?
PR https://github.com/Expensify/App/pull/41038 open!
Note: Automation auto-assigned C+ for review which we probably we don't need ?
cc @marcochavezf @roryabraham
This issue has not been updated in over 15 days. @marcochavezf, @mallenexpensify, @ikevin127 eroding to Monthly issue.
P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!
Working on the PR, updates are almost weekly there 😄
Thanks @ikevin127 , is the PR awaiting actions from you? Seems to be steadily moving along.
@mallenexpensify Yes, I have to look into the latest comments and push new changes which I plan on doing by EOW.
This issue has not been updated in over 15 days. @marcochavezf, @mallenexpensify, @ikevin127 eroding to Monthly issue.
P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!
We're working on the PR - feels like we're close to merge. PR is ready for review again!
PR was ✅ Approved by Rory, now we're waiting on the AI assistant setup as detailed in https://github.com/Expensify/App/issues/35108#issuecomment-2020354859 (Important) in order to have all necessary puzzle pieces to deploy and launch ProposalPolice™ 👮🏼.
Pinged @marcochavezf and @ikevin127 in NewDot to discuss rollout and documentation plan.
Slack context: https://expensify.slack.com/archives/C01GTK53T8Q/p1706120841703079
Proposal
develop a GitHub 🤖 using GPT-4 capabilities to standardize and monitor proposals.
Strategy
The purpose of the Expensify Open-Source Community is to provide a global community of all the best React Native developers in the world streamlined access to contribute to the Expensify product. In order to attract and retain talented developers, the Expensify Open-Source Community must be inclusive, fair, and provide equitable opportunity to all community members, including newcomers.
Problem
There is a tendency among the open-source community to perform what I call speedrun proposals – a contributor will post an incomplete or otherwise low-quality proposal very quickly in order to be the first proposal, then subsequently edit it multiple times. Sometimes these edits significantly alter the RCA or solution and/or derive from later proposals without clear acknowledgement. When the C+ or Expensify engineer go to review those proposals, they will tend to choose the first proposal of sufficient quality that solves the problem – and in the case of edited proposals it takes some extra sleuthing to understand the true timeline of proposals. In short, this practice of speedrun proposals inhibits open collaboration amongst contributors and makes it harder for reviewers to fairly evaluate proposals against each other.
Solution
Create a open-source GitHub bot (tentatively named the proposal-police™ 👮🏼🚨) that helps to enforce the documented best practices for proposals, and aids the reviewers in maintaining a clearer timeline of proposals.
More specifically, the proposal-police™ would leverage GPT-4 to:
Last edited: timestamp
, making the timeline clearer to proposal reviewers.In conclusion, the proposal-police™ will help promote higher quality proposals and a more informed and trust-based review process :handshake:, improving baseline fairness and allowing community members and Expensify employees to better focus on getting shit :done:!
Issue Owner
Current Issue Owner: @ikevin127