Closed elprans closed 1 year ago
Well, I don't believe this format is a good one. And we're probably going to save that automatically sometimes. So JSON might be good if we're going to save them by the tool. And Hjson and TOML are better if it's likely that user edits the file. (mixing it, i.e. storing automatically into Hjson or TOML made by user a bit of a challenge to not discard user's comments and formatting).
Also I'm not sure that storing password in a text file is something we should do in the first place in 2020. There are precedents of integrating with platform password manager (I remember that back in SVN times).
(moved from edgedb/edgedb-cli, as this is a driver issue)
Well, I don't believe this format is a good one.
Using either JSON or simple table is OK. The key goal here is the ease of parsing, since we will be adding support for this to every driver and we don't want to be adding dependencies for esoteric formats or implementing parsing thereof just for this purpose.
Also I'm not sure that storing password in a text file is something we should do in the first place in 2020. There are precedents of integrating with platform password manager (I remember that back in SVN times).
The main intended use case here is not the developer machine, but a cloud server or a container, where secrets are normally provisioned as simple files. Also, the point of having to implement this in every driver applies here also.
The key goal here is the ease of parsing, since we will be adding support for this to every driver
This is what I'm clearly opposed of.
The main intended use case here is not the developer machine, but a cloud server or a container
os.environ['EDGEDB_SECRET']
in any languagehost:port
part every time host changes is ugly.connect(**json.load(open("some_file.json")))
. I don't think app developers need any help here.I don't know any tool that reads passwords from external file as a side effect of the API call, so I don't understand the intent here.
Most provisioning tools are fine with adding an environment variable (including CIs) It's easy enough to read
os.environ['EDGEDB_SECRET']
in any language
Yes, it's easy to add but it's also arguably less secure, because environment gets inherited and may leak unintentionally.
Database hosts change all the time in cloud environment, but not the credentials. I.e. rewriting host:port part every time host changes is ugly.
You can always put *
in host:port
if you don't want to hardcode them.
I don't know any tool that reads passwords from external file as a side effect of the API call, so I don't understand the intent here.
File-based secrets storage is a widespread practice, kubectl
, gcloud
, aws
CLI tool, and their corresponding underlying SDKs all support and promote it. libpq
is the most direct example of a database driver library.
I don't understand the intent here.
The intent is to enable secure and convenient way to specify credentials, including in scenarios where you use a third-party app/library that does not explicitly support file-based auth (i.e. the developer of that app did not bother to add the aforementioned connect(**json.load(open("some_file.json")))
)
Another point towards supporting file-based auth (and going with JSON) is that we might at some point add OAuth-based authentication flow for hosted instances, i.e. the edgedb
tool pops a browser window, gets tokens, saves them into a file that the drivers can then use to authenticate.
You can always put * in host:port if you don't want to hardcode them.
This works remarkably bad if you put the passwords file in $HOME. Having *
in prod and specific host in development config is counter-intuitive: you should be more specific in production config because you care about misconfiguration so much more.
The intent is to enable secure and convenient way to specify credentials, including in scenarios where you use a third-party app/library that does not explicitly support file-based auth
"We put your credential into a file ourselves, but you must put the file into secure storage on your own". So it's likely users which can't read JSON themselves would likely to commit that file into a git or a Docker image.
Another point towards supporting file-based auth (and going with JSON) is that we might at some point add OAuth-based authentication flow for hosted instances, i.e. the edgedb tool pops a browser window, gets tokens, saves them into a file that the drivers can then use to authenticate.
I don't understand scenario for that. Is it for local development, but remote database? (local-local can have no password, and deployment requires putting the file not in $HOME). Every other system just shows you a secret in browser and I don't think this needs improvement.
(Also I'm pretty sure that you can't "spawn a browser" and "get a result from it" in generic browser. Making an electron app which doesn't have browser's cookies and asking user authorize again in our app is both: user giving us credentials they don't want, and huge amount of effort on our side)
So generally:
So how about just:
connect(..., password_from_file="filename.txt")
connect(..., password_from_env="PASSWORD")
Where password_from_file
reads a file that contains just a single password (perhaps we should also strip newline at the end).
(yes for CLI we need an actual password list, but that's a different topic)
I'm actively against anything like this without an explicit opt-in
Actually I agree with Paul. I understand that conditionally adding a password_from_file=
argument might not be convenient for some users but it feels that automagically reading password files is just a bad archaic practice.
@elprans do you really think it's going to be a UX problem if we don't support pgpass-like functionality without an explicit command line parameter / API arg?
you should be more specific in production config because you care about misconfiguration so much more.
I see three scenarios:
You have a relatively static infrastructure that you manage by hand, i.e. the old-school way of running things. In this case the hostnames or IPs do not change frequently because otherwise you'd be reconfiguring everything every day and EdgeDB is not special here.
You have a highly automated infrastructure system based on Kubernetes or something similar. In this case the configuration files are consistently managed by the system. The same operator that is responsible in providing the address and the configuration of the database server can generate a password table. There's no room for misconfiguration here.
You only have a single database and you genuinely don't care, so you can put '*' in the password file to avoid touching it.
We put your credential into a file ourselves,
Ourselves? Quite the opposite, it's the user's or deployment system's responsibility.
Is it for local development, but remote database?
This is for the hosted version, including try.edgedb.com
, where you would use a remote instance for development/prototyping.
Also I'm pretty sure that you can't "spawn a browser"
Of course you can. The only thing you need is a very simple HTTP server for OAuth flow to call back into.
yes for CLI we need an actual password list, but that's a different topic
I don't think that inventing different auth methods for CLI and for API is a good idea. The CLI can be used in the same environment as the API, e.g. for scripting.
it feels that automagically reading password files is just a bad archaic practice.
We don't have to make it automagic, but I very much disagree that file-based secrets are archaic.
Of course you can. The only thing you need is a very simple HTTP server for OAuth flow to call back into.
Well, I haven't thought of this hack. But there still plenty of issues: most oauth providers require fixed URL (i.e. we can't ran on random port and finding universally unused is a problem). HTTPs. This might work if we're the (only) OAuth provider for this case, though.
The CLI can be used in the same environment as the API, e.g. for scripting.
Let's just make a small module:
from edgedb.scripting import connection
conn = connection()
This should also read environment variables just like CLI.
But other than API, you're right that this needs simple to read file.
But anyway, summary is:
Let's settle on other things:
EDGEDB_PASSFILE
env variable ?edgedb -H my.db
and get access to the database with whatever random port and user name generated by someone.Elaborating on (3), this is a bit tricky to do, as I don't want complex rules. Something like this should work:
{"host": "example.org",
"port": 1234,
"database": "tutorial",
"user": "user1",
"password": "xxxyyyzzz",
"default": true}
Where default: true
provides default value for port, user and database if they aren't wildcard and not specified specifically on the command-line.
Other thoughts:
port: null
as a wildcard instead of *
.host, port, database, user
Default location
Yes, using the platform's standard paths makes sense.
EDGEDB_PASSFILE
env variable
Yes.
Where
default: true
provides default value for port, user and database
If we are to support ssh-style substitution, I think we need to make it explicit on the API side, otherwise the behavior might be very surprising to the unsuspecting person who debugs the code.
most specific rule wins looks like easier to deal with than "first/last rule wins" for programmatic file editing
But it's also much harder to reason about for humans. If we decide to go this way, we'll probably need to add some way of showing the user which rule was matched for a given connection string.
Where
default: true
provides default value for port, user and databaseIf we are to support ssh-style substitution, I think we need to make it explicit on the API side, otherwise the behavior might be very surprising to the unsuspecting person who debugs the code.
most specific rule wins looks like easier to deal with than "first/last rule wins" for programmatic file editing
But it's also much harder to reason about for humans. If we decide to go this way, we'll probably need to add some way of showing the user which rule was matched for a given connection string.
Well, both of these are:
Also I'm not sure that "most specific rule" is harder to reason for humans:
We can probably make it a mapping, to make things clearer:
{
"user_specified_id_1": {"database": "mydb", "host": "localhost",...},
"hash:0xBEEF01": {"database": "mydb", "host": "localhost",...},
}
Perhaps using hashes or numeric indices in programmatically generated entries to refer to specific entry in error message. This should be easy enough to do in any language bindings. In Rust, we'll probably be able to show line numbers anyway.
As a user I'd prefer:
The connection string as a url, e.g. edgedb://username:password@host:port/database?option1=value&option2=value2
which would look something like Connection::new(Config::from_url("..."))
in code.
Every language will have a url parser included. The commonly used fields (username, password, host, post) have a concise and standardized way of being expressed. The query parameters enable for extensibility (in the extreme case storing json inside a query parameter).
Connection::new(Config::discover())
.I think this was implemented as part of the implementation of the query params in the DSN.
psql
/libpq
can read auth information from a.pgpass
tabfile with the following format:<host>:<port>:<database>:<user>:<password>
We don't do per-database auth, so our tab would omit the
database
field.