Initial plugin design - Githubissues

simonw commented 7 months ago

The goal of this plugin is to provide an interface for admin users of a Datasette instance to set secrets such as API keys, which can then be used by other plugins.

Plugins that already need this:

datasette-extract for OpenAI
datasette-enrichments-opencage for OpenCage Geocoder
datasette-query-assistant for Anthropic Claude 3

Datasette Cloud needs this, and I imagine it will quickly become useful for Datasette Desktop and https://github.com/datasette/studio as well.

Tasks:

[x] datasette secrets generate-encryption-key command
[x] Implement database schema
[x] Implement UI for editing secrets - should differ slightly for add v.s. update in terms of copy and placeholder text
[x] Implement encryption-at-rest
[x] Implement manage-secrets permission for managing secrets
[x] Implement register_secrets() hook - secrets should have an optional description which gets shown on the edit secret page
[x] Implement API for plugins to retrieve secrets that they need
[x] https://github.com/datasette/datasette-secrets/issues/2
[x] https://github.com/datasette/datasette-secrets/issues/3
[x] Documentation (including for plugin authors)

simonw commented 7 months ago

Classic challenge here is that I don't want to store the secrets unencrypted at rest, but I still need the Datasette application to be able to decrypt them so that it can use them.

A simple starting approach: use https://pypi.org/project/cryptography/ and symmetric encryption based on a key in an environment variable:

>>> from cryptography.fernet import Fernet
>>> # Put this somewhere safe!
>>> key = Fernet.generate_key()
>>> f = Fernet(key)
>>> token = f.encrypt(b"A really secret message. Not for prying eyes.")
>>> token
b'...'
>>> f.decrypt(token)
b'A really secret message. Not for prying eyes.'

The secret key itself is something sits in the environment variables, and is populated on startup.

This means that a leaked or accidentally shared SQLite database file will not contain unencrypted secrets - the attacker would need to get hold of that environment variable too.

simonw commented 7 months ago

This plugin can provide its own plugin hook, register_secrets(), for other plugins to let it know that e.g. they need a secret called anthropic - that can then be added to the UI managed by this plugin for people to add secrets.

I'm not going to allow end users to access the decrypted secrets even if they are admins - secrets will be write-only in the UI, like they are for GitHub Actions.

simonw commented 7 months ago

The plugin will store secrets in _internal by default, but users can configure it to tell it to store them elsewhere.

simonw commented 7 months ago

Some level of auditing could be useful. I can store the time each secret was last "used", where used is defined as a plugin making a call to the await datasette_secrets.get_secret(datasette, name_of_secret) function.

I could even record a stack trace showing who that caller was? Not sure about that, it's a bit weird.

Or... how about that method has an optional user_id= parameter which then records which user ID requested the secret? There could be an option to log those permanently, or a default where there's a capped collection of log rows.

simonw commented 7 months ago

There could even be a mechanism by which access to secrets can differ based on the user asking for them - this would allow secrets to be permission-controlled, which might be useful but might also be confusing given Datasette's existing permission system.

simonw commented 7 months ago

Maybe the plugin adds its own permission, datasette-secret, which takes a named resource that's the name of the secret.

That would allow for permission checks using the rest of the existing permission system. I think that's a pretty good option.

simonw commented 7 months ago

This may mean there should be a can_access_secret() method separate from get_secret() - so that the auditing code doesn't record every time a permission check was made (e.g. to decide if a table action menu item should be shown) as opposed to when people actually used the secret for something.

simonw commented 7 months ago

First hook, inspired by https://github.com/datasette/datasette-enrichments/blob/main/datasette_enrichments/hookspecs.py

@hookspec
def register_secrets(datasette):
    "Return a list of Secret instances, or an awaitable function returning that list"

I think a Secret will be a dataclass. Each secret needs a name, a description and maybe optional notes about how it should be redacted - so we can display a sensible redacted version like sk-...abc as seen in things like the OpenAI key management UI:

CleanShot 2024-04-22 at 11 20 59@2x

simonw commented 7 months ago

Schema design (in _internal or another configured database):

datasette_secrets:

name text primary key
encrypted blob
encryption_key text - the name of the key that was used to encrypt it, maybe an environment variable name?
redacted text - a redacted version of the key suitable for display to users, for keys that support that
created_at text - UTC created
updated_at text - UTC when it was last modified
last_used_at text - UTC last access date
created_by text - actor ID that created it
updated_by text - actor ID that last modified it (set a new value for it)

Should there be history of when secrets were deleted? If yes then I'll need to make name not a primary key, since there could be multiple rows for a secret.

simonw commented 7 months ago

I think I want records of who updated the secrets and when (and who deleted them) - so if something breaks it's easy to figure out why.

simonw commented 7 months ago

I'm going to add a version integer field to track the version of a secret and deleted_at and deleted_by fields too. The primary key will be on (name, version) so I can efficiently grab the most recent version.

simonw commented 7 months ago

Actually since I want a separate datasette_secrets_log table logging access (potentially capped) I should have a primary key that's a single column so I can easily foreign key reference it in a way that works in Datasette (which doesn't support foreign keys to compound keys in the UI yet).

simonw commented 7 months ago

The GitHub Actions secrets UI looks like this:

CleanShot 2024-04-22 at 11 38 58@2x

Then all the edit form lets you do is this:

CleanShot 2024-04-22 at 11 39 25@2x

I want a design with a bit more stuff on it, since we have audit logging features.

simonw commented 7 months ago

This plugin requires configuration, because we don't want people to accidentally start writing secrets to their _internal database that gets cleared on restart.

The documentation will strongly encourage a DATASETTE_SECRETS_ENCRYPTION_KEY environment variable for the Fernet key.

It will provide a command for generating a new one:

datasette secrets generate

simonw commented 7 months ago

Secrets should have a notes optional column too.

simonw commented 7 months ago

Actually datasette secrets generate is a bad name, it sounds like it's generating a new secret but it's actually creating the encryption key. I'll rename to this:

datasette secrets generate-encryption-key

simonw commented 7 months ago

URL design:

/-/secrets - list of secrets
/-/secrets/add - add a secret
/-/secrets/update/SECRET_NAME - update a secret

simonw commented 7 months ago

CleanShot 2024-04-22 at 15 32 36@2x

Still needs tests and should redirect to the list of secrets.

Also do we even allow people to set the secret name? Maybe not, if plugins get to define what secrets can be set.

simonw commented 7 months ago

It's coming together. I'm running the local dev environment for testing like this:

datasette --create data.db -p 8003 \
  -s plugins.datasette-secrets.encryption-key 'kU8ZA4nqUsEH0KVkWaTx_i1xSe5L6RKAczdx1n6Mo8A=' \
  -s plugins.datasette-secrets.database data \
  -s permissions.manage-secrets.id root
  --reload --root --secret 1

simonw commented 7 months ago

I'm going to let secret descriptions include HTML, so they can link to e.g. the OpenAI API key portal.

simonw commented 7 months ago

Documentation

Datasette plugins sometimes need access to secrets, such as API keys used to integrate with tools hosted outside of Datasette - things like geocoders so hosted AI language models.

This plugin provides ways to configure those secrets:

Secrets can be configured using environment variables, such as DATASETTE_SECRETS_OPENAI_API_KEY
Secrets can be stored, encrypted, in a SQLite database table which administrator users can then update through the Datasette web interface

simonw commented 7 months ago

This plugin depends on a Datasette 1.0 alpha release.

Using the internal database

While the secrets stored in the datasette_secrets table are encrypted, we still recommend hiding that table from view.

One way to do that is to keep the table in Datasette's internal database, which is invisible to all users, even users who are logged in.

By default, the internal database is an in-memory database that is reset when Datasette restarts. This is no good for persistent secret storage!

Instead, you should switch Datasette to using an on-disk internal database. You can do this by starting Datasette with the --internal option:

datasette data.db --internal internal.db

Your secrets will be stored in the datasette_secrets table in that database file.

simonw commented 7 months ago

Permissions

Only users with the manage-secrets permission will have access to manage secrets through the Datasette web interface.

You can grant that permission to the root user (or the user with an ID of your choice) by including this in your datasette.yml file:

permissions:
  manage-secrets:
    id: root

Theb start Datasette like this (with --root to get a URL to login as the root user):

datasette data.db --internal internal.db -c datasette.yml --root

Alternatively, use the -s option to set that setting without creating a configuration file:

datasette data.db --internal internal.db \
  -s permissions.manage-secrets.id root \
  --root

simonw commented 7 months ago

Python API design:

from datasette_secrets import get_secret

secret = await get_secret(datasette, "ANTHROPIC_API_KEY")

Which returns a string or None.

simonw commented 7 months ago

Secrets will always be strings. If you want to store bytes (e.g. a weird key of some sort) you'll need to serialize that as a unicode string, because otherwise the edit interface won't be able to handle it.

simonw commented 7 months ago

async def get_secret(datasette, secret_name):
    secrets_by_name = {secret.name: secret for secret in await get_secrets(datasette)}
    if secret_name not in secrets_by_name:
        return None
    # Is it an environment secret?
    env_var = "DATASETTE_SECRETS_{}".format(secret_name)
    if os.environ.get(env_var):
        return os.environ[env_var]
    # Now look it up in the database
    config = get_config(datasette)
    db = get_database(datasette)
    encrypted = (
        await db.execute(
            "select encrypted from datasette_secrets where name = ? order by version desc limit 1",
            (secret_name,),
        )
    ).first()
    if not encrypted:
        return None
    key = Fernet(config["encryption_key"].encode("utf-8"))
    decrypted = key.decrypt(encrypted["encrypted"])
    return decrypted.decode("utf-8")

Needs a lot of tests.

Also it should tak the actor_id and use that to update the last_used_by field.

simonw commented 7 months ago

I built a tiny demo plugin: https://gist.github.com/simonw/d6c3500ea0c77499034df6b4e409e3e3

from datasette_secrets import Secret
from datasette import hookimpl

@hookimpl
def register_secrets():
    return [
        Secret(
            "DEMO_SECRET_ONE",
            "First demo secret",
        ),
        Secret(
            "DEMO_SECRET_TWO",
            "Second demo secret",
        ),
    ]

Install like this:

datasette install https://gist.github.com/simonw/d6c3500ea0c77499034df6b4e409e3e3/archive/2ae951bfbf2f945396c31cadadc437422a5df6f8.zip

datasette / datasette-secrets

Initial plugin design #1

Documentation

Using the internal database

Permissions