Closed simonw closed 6 months ago
Classic challenge here is that I don't want to store the secrets unencrypted at rest, but I still need the Datasette application to be able to decrypt them so that it can use them.
A simple starting approach: use https://pypi.org/project/cryptography/ and symmetric encryption based on a key in an environment variable:
>>> from cryptography.fernet import Fernet
>>> # Put this somewhere safe!
>>> key = Fernet.generate_key()
>>> f = Fernet(key)
>>> token = f.encrypt(b"A really secret message. Not for prying eyes.")
>>> token
b'...'
>>> f.decrypt(token)
b'A really secret message. Not for prying eyes.'
The secret key itself is something sits in the environment variables, and is populated on startup.
This means that a leaked or accidentally shared SQLite database file will not contain unencrypted secrets - the attacker would need to get hold of that environment variable too.
This plugin can provide its own plugin hook, register_secrets()
, for other plugins to let it know that e.g. they need a secret called anthropic
- that can then be added to the UI managed by this plugin for people to add secrets.
I'm not going to allow end users to access the decrypted secrets even if they are admins - secrets will be write-only in the UI, like they are for GitHub Actions.
The plugin will store secrets in _internal
by default, but users can configure it to tell it to store them elsewhere.
Some level of auditing could be useful. I can store the time each secret was last "used", where used is defined as a plugin making a call to the await datasette_secrets.get_secret(datasette, name_of_secret)
function.
I could even record a stack trace showing who that caller was? Not sure about that, it's a bit weird.
Or... how about that method has an optional user_id=
parameter which then records which user ID requested the secret? There could be an option to log those permanently, or a default where there's a capped collection of log rows.
There could even be a mechanism by which access to secrets can differ based on the user asking for them - this would allow secrets to be permission-controlled, which might be useful but might also be confusing given Datasette's existing permission system.
Maybe the plugin adds its own permission, datasette-secret
, which takes a named resource that's the name of the secret.
That would allow for permission checks using the rest of the existing permission system. I think that's a pretty good option.
This may mean there should be a can_access_secret()
method separate from get_secret()
- so that the auditing code doesn't record every time a permission check was made (e.g. to decide if a table action menu item should be shown) as opposed to when people actually used the secret for something.
First hook, inspired by https://github.com/datasette/datasette-enrichments/blob/main/datasette_enrichments/hookspecs.py
@hookspec
def register_secrets(datasette):
"Return a list of Secret instances, or an awaitable function returning that list"
I think a Secret
will be a dataclass
. Each secret needs a name, a description and maybe optional notes about how it should be redacted - so we can display a sensible redacted version like sk-...abc
as seen in things like the OpenAI key management UI:
Schema design (in _internal
or another configured database):
datasette_secrets
:
name
text primary keyencrypted
blobencryption_key
text - the name of the key that was used to encrypt it, maybe an environment variable name?redacted
text - a redacted version of the key suitable for display to users, for keys that support thatcreated_at
text - UTC createdupdated_at
text - UTC when it was last modifiedlast_used_at
text - UTC last access datecreated_by
text - actor ID that created itupdated_by
text - actor ID that last modified it (set a new value for it)Should there be history of when secrets were deleted? If yes then I'll need to make name
not a primary key, since there could be multiple rows for a secret.
I think I want records of who updated the secrets and when (and who deleted them) - so if something breaks it's easy to figure out why.
I'm going to add a version
integer field to track the version of a secret and deleted_at
and deleted_by
fields too. The primary key will be on (name
, version
) so I can efficiently grab the most recent version.
Actually since I want a separate datasette_secrets_log
table logging access (potentially capped) I should have a primary key that's a single column so I can easily foreign key reference it in a way that works in Datasette (which doesn't support foreign keys to compound keys in the UI yet).
The GitHub Actions secrets UI looks like this:
Then all the edit form lets you do is this:
I want a design with a bit more stuff on it, since we have audit logging features.
This plugin requires configuration, because we don't want people to accidentally start writing secrets to their _internal
database that gets cleared on restart.
The documentation will strongly encourage a DATASETTE_SECRETS_ENCRYPTION_KEY
environment variable for the Fernet key.
It will provide a command for generating a new one:
datasette secrets generate
Secrets should have a notes
optional column too.
Actually datasette secrets generate
is a bad name, it sounds like it's generating a new secret but it's actually creating the encryption key. I'll rename to this:
datasette secrets generate-encryption-key
URL design:
/-/secrets
- list of secrets/-/secrets/add
- add a secret/-/secrets/update/SECRET_NAME
- update a secretStill needs tests and should redirect to the list of secrets.
Also do we even allow people to set the secret name? Maybe not, if plugins get to define what secrets can be set.
It's coming together. I'm running the local dev environment for testing like this:
datasette --create data.db -p 8003 \
-s plugins.datasette-secrets.encryption-key 'kU8ZA4nqUsEH0KVkWaTx_i1xSe5L6RKAczdx1n6Mo8A=' \
-s plugins.datasette-secrets.database data \
-s permissions.manage-secrets.id root
--reload --root --secret 1
I'm going to let secret descriptions include HTML, so they can link to e.g. the OpenAI API key portal.
Datasette plugins sometimes need access to secrets, such as API keys used to integrate with tools hosted outside of Datasette - things like geocoders so hosted AI language models.
This plugin provides ways to configure those secrets:
DATASETTE_SECRETS_OPENAI_API_KEY
This plugin depends on a Datasette 1.0 alpha release.
While the secrets stored in the datasette_secrets
table are encrypted, we still recommend hiding that table from view.
One way to do that is to keep the table in Datasette's internal database, which is invisible to all users, even users who are logged in.
By default, the internal database is an in-memory database that is reset when Datasette restarts. This is no good for persistent secret storage!
Instead, you should switch Datasette to using an on-disk internal database. You can do this by starting Datasette with the --internal
option:
datasette data.db --internal internal.db
Your secrets will be stored in the datasette_secrets
table in that database file.
Only users with the manage-secrets
permission will have access to manage secrets through the Datasette web interface.
You can grant that permission to the root
user (or the user with an ID of your choice) by including this in your datasette.yml
file:
permissions:
manage-secrets:
id: root
Theb start Datasette like this (with --root
to get a URL to login as the root user):
datasette data.db --internal internal.db -c datasette.yml --root
Alternatively, use the -s
option to set that setting without creating a configuration file:
datasette data.db --internal internal.db \
-s permissions.manage-secrets.id root \
--root
Python API design:
from datasette_secrets import get_secret
secret = await get_secret(datasette, "ANTHROPIC_API_KEY")
Which returns a string or None
.
Secrets will always be strings. If you want to store bytes (e.g. a weird key of some sort) you'll need to serialize that as a unicode string, because otherwise the edit interface won't be able to handle it.
async def get_secret(datasette, secret_name):
secrets_by_name = {secret.name: secret for secret in await get_secrets(datasette)}
if secret_name not in secrets_by_name:
return None
# Is it an environment secret?
env_var = "DATASETTE_SECRETS_{}".format(secret_name)
if os.environ.get(env_var):
return os.environ[env_var]
# Now look it up in the database
config = get_config(datasette)
db = get_database(datasette)
encrypted = (
await db.execute(
"select encrypted from datasette_secrets where name = ? order by version desc limit 1",
(secret_name,),
)
).first()
if not encrypted:
return None
key = Fernet(config["encryption_key"].encode("utf-8"))
decrypted = key.decrypt(encrypted["encrypted"])
return decrypted.decode("utf-8")
Needs a lot of tests.
Also it should tak the actor_id and use that to update the last_used_by
field.
I built a tiny demo plugin: https://gist.github.com/simonw/d6c3500ea0c77499034df6b4e409e3e3
from datasette_secrets import Secret
from datasette import hookimpl
@hookimpl
def register_secrets():
return [
Secret(
"DEMO_SECRET_ONE",
"First demo secret",
),
Secret(
"DEMO_SECRET_TWO",
"Second demo secret",
),
]
Install like this:
datasette install https://gist.github.com/simonw/d6c3500ea0c77499034df6b4e409e3e3/archive/2ae951bfbf2f945396c31cadadc437422a5df6f8.zip
The goal of this plugin is to provide an interface for admin users of a Datasette instance to set secrets such as API keys, which can then be used by other plugins.
Plugins that already need this:
Datasette Cloud needs this, and I imagine it will quickly become useful for Datasette Desktop and https://github.com/datasette/studio as well.
Tasks:
datasette secrets generate-encryption-key
commandmanage-secrets
permission for managing secretsregister_secrets()
hook - secrets should have an optional description which gets shown on the edit secret page