edgedb / edgedb-rust

The official Rust binding for EdgeDB
https://edgedb.com
Apache License 2.0
215 stars 26 forks source link

Add support for reading passwords from a file #24

Closed elprans closed 1 year ago

elprans commented 4 years ago

psql / libpq can read auth information from a .pgpass tabfile with the following format:

<host>:<port>:<database>:<user>:<password>

We don't do per-database auth, so our tab would omit the database field.

tailhook commented 4 years ago

Well, I don't believe this format is a good one. And we're probably going to save that automatically sometimes. So JSON might be good if we're going to save them by the tool. And Hjson and TOML are better if it's likely that user edits the file. (mixing it, i.e. storing automatically into Hjson or TOML made by user a bit of a challenge to not discard user's comments and formatting).

Also I'm not sure that storing password in a text file is something we should do in the first place in 2020. There are precedents of integrating with platform password manager (I remember that back in SVN times).

elprans commented 4 years ago

(moved from edgedb/edgedb-cli, as this is a driver issue)

Well, I don't believe this format is a good one.

Using either JSON or simple table is OK. The key goal here is the ease of parsing, since we will be adding support for this to every driver and we don't want to be adding dependencies for esoteric formats or implementing parsing thereof just for this purpose.

Also I'm not sure that storing password in a text file is something we should do in the first place in 2020. There are precedents of integrating with platform password manager (I remember that back in SVN times).

The main intended use case here is not the developer machine, but a cloud server or a container, where secrets are normally provisioned as simple files. Also, the point of having to implement this in every driver applies here also.

tailhook commented 4 years ago

The key goal here is the ease of parsing, since we will be adding support for this to every driver

This is what I'm clearly opposed of.

The main intended use case here is not the developer machine, but a cloud server or a container

  1. Most provisioning tools are fine with adding an environment variable (including CIs)
  2. It's easy enough to read os.environ['EDGEDB_SECRET'] in any language
  3. Database hosts change all the time in cloud environment, but not the credentials. I.e. rewriting host:port part every time host changes is ugly.
  4. The (3) is even uglier in the presence of replicas or failover which is not over DNS.
  5. Most applications don't have more than one database so don't need "rule-based" file as CLI's example above. So reading it from file is like connect(**json.load(open("some_file.json"))). I don't think app developers need any help here.
  6. Password storage should be uniform across the application: i.e. all passwords of the app should be stored in the same place. It would be very ugly if I would store my edgedb password in one place/format and AWS secret in another. So we shouldn't encourage any method of our own.

I don't know any tool that reads passwords from external file as a side effect of the API call, so I don't understand the intent here.

elprans commented 4 years ago

Most provisioning tools are fine with adding an environment variable (including CIs) It's easy enough to read os.environ['EDGEDB_SECRET'] in any language

Yes, it's easy to add but it's also arguably less secure, because environment gets inherited and may leak unintentionally.

Database hosts change all the time in cloud environment, but not the credentials. I.e. rewriting host:port part every time host changes is ugly.

You can always put * in host:port if you don't want to hardcode them.

I don't know any tool that reads passwords from external file as a side effect of the API call, so I don't understand the intent here.

File-based secrets storage is a widespread practice, kubectl , gcloud, aws CLI tool, and their corresponding underlying SDKs all support and promote it. libpq is the most direct example of a database driver library.

elprans commented 4 years ago

I don't understand the intent here.

The intent is to enable secure and convenient way to specify credentials, including in scenarios where you use a third-party app/library that does not explicitly support file-based auth (i.e. the developer of that app did not bother to add the aforementioned connect(**json.load(open("some_file.json"))))

elprans commented 4 years ago

Another point towards supporting file-based auth (and going with JSON) is that we might at some point add OAuth-based authentication flow for hosted instances, i.e. the edgedb tool pops a browser window, gets tokens, saves them into a file that the drivers can then use to authenticate.

tailhook commented 4 years ago

You can always put * in host:port if you don't want to hardcode them.

This works remarkably bad if you put the passwords file in $HOME. Having * in prod and specific host in development config is counter-intuitive: you should be more specific in production config because you care about misconfiguration so much more.

The intent is to enable secure and convenient way to specify credentials, including in scenarios where you use a third-party app/library that does not explicitly support file-based auth

"We put your credential into a file ourselves, but you must put the file into secure storage on your own". So it's likely users which can't read JSON themselves would likely to commit that file into a git or a Docker image.

Another point towards supporting file-based auth (and going with JSON) is that we might at some point add OAuth-based authentication flow for hosted instances, i.e. the edgedb tool pops a browser window, gets tokens, saves them into a file that the drivers can then use to authenticate.

I don't understand scenario for that. Is it for local development, but remote database? (local-local can have no password, and deployment requires putting the file not in $HOME). Every other system just shows you a secret in browser and I don't think this needs improvement.

(Also I'm pretty sure that you can't "spawn a browser" and "get a result from it" in generic browser. Making an electron app which doesn't have browser's cookies and asking user authorize again in our app is both: user giving us credentials they don't want, and huge amount of effort on our side)


So generally:

  1. I'm actively against anything like this without an explicit opt-in
  2. I don't see rule-based file necessary: 99.9% users should use single DB in any given app (otherwise we're doing our job in a wrong way)

So how about just:

connect(..., password_from_file="filename.txt")
connect(..., password_from_env="PASSWORD")

Where password_from_file reads a file that contains just a single password (perhaps we should also strip newline at the end).

(yes for CLI we need an actual password list, but that's a different topic)

1st1 commented 4 years ago

I'm actively against anything like this without an explicit opt-in

Actually I agree with Paul. I understand that conditionally adding a password_from_file= argument might not be convenient for some users but it feels that automagically reading password files is just a bad archaic practice.

@elprans do you really think it's going to be a UX problem if we don't support pgpass-like functionality without an explicit command line parameter / API arg?

elprans commented 4 years ago

you should be more specific in production config because you care about misconfiguration so much more.

I see three scenarios:

  1. You have a relatively static infrastructure that you manage by hand, i.e. the old-school way of running things. In this case the hostnames or IPs do not change frequently because otherwise you'd be reconfiguring everything every day and EdgeDB is not special here.

  2. You have a highly automated infrastructure system based on Kubernetes or something similar. In this case the configuration files are consistently managed by the system. The same operator that is responsible in providing the address and the configuration of the database server can generate a password table. There's no room for misconfiguration here.

  3. You only have a single database and you genuinely don't care, so you can put '*' in the password file to avoid touching it.

We put your credential into a file ourselves,

Ourselves? Quite the opposite, it's the user's or deployment system's responsibility.

Is it for local development, but remote database?

This is for the hosted version, including try.edgedb.com, where you would use a remote instance for development/prototyping.

Also I'm pretty sure that you can't "spawn a browser"

Of course you can. The only thing you need is a very simple HTTP server for OAuth flow to call back into.

yes for CLI we need an actual password list, but that's a different topic

I don't think that inventing different auth methods for CLI and for API is a good idea. The CLI can be used in the same environment as the API, e.g. for scripting.

it feels that automagically reading password files is just a bad archaic practice.

We don't have to make it automagic, but I very much disagree that file-based secrets are archaic.

tailhook commented 4 years ago

Of course you can. The only thing you need is a very simple HTTP server for OAuth flow to call back into.

Well, I haven't thought of this hack. But there still plenty of issues: most oauth providers require fixed URL (i.e. we can't ran on random port and finding universally unused is a problem). HTTPs. This might work if we're the (only) OAuth provider for this case, though.

The CLI can be used in the same environment as the API, e.g. for scripting.

Let's just make a small module:

from edgedb.scripting import connection
conn = connection()

This should also read environment variables just like CLI.

But other than API, you're right that this needs simple to read file.

tailhook commented 4 years ago

But anyway, summary is:

  1. We need a file as fallback even if we use platform's password manager by default
  2. We need to read it from language bindings, so it has to be JSON

Let's settle on other things:

  1. Default location (these are platform config dirs, just showing explicitly):
    • Linux /home/alice/.config/edgedb/credentials.json
    • Windows C:\Users\Alice\AppData\Roaming\EdgeDB\credentials.json
    • Mac /Users/Alice/Library/Preferences/EdgeDB/credentials.json
  2. EDGEDB_PASSFILE env variable ?
  3. What are the rules? Postgres gets only password from the file. I like ssh's config which allows to substitute port, and user, so you can get edgedb -H my.db and get access to the database with whatever random port and user name generated by someone.

Rules

Elaborating on (3), this is a bit tricky to do, as I don't want complex rules. Something like this should work:

{"host": "example.org",
 "port": 1234,
 "database": "tutorial",
 "user": "user1",
 "password": "xxxyyyzzz",
 "default": true}

Where default: true provides default value for port, user and database if they aren't wildcard and not specified specifically on the command-line.

Other thoughts:

elprans commented 4 years ago

Default location

Yes, using the platform's standard paths makes sense.

EDGEDB_PASSFILE env variable

Yes.

Where default: true provides default value for port, user and database

If we are to support ssh-style substitution, I think we need to make it explicit on the API side, otherwise the behavior might be very surprising to the unsuspecting person who debugs the code.

most specific rule wins looks like easier to deal with than "first/last rule wins" for programmatic file editing

But it's also much harder to reason about for humans. If we decide to go this way, we'll probably need to add some way of showing the user which rule was matched for a given connection string.

tailhook commented 4 years ago

Where default: true provides default value for port, user and database

If we are to support ssh-style substitution, I think we need to make it explicit on the API side, otherwise the behavior might be very surprising to the unsuspecting person who debugs the code.

most specific rule wins looks like easier to deal with than "first/last rule wins" for programmatic file editing

But it's also much harder to reason about for humans. If we decide to go this way, we'll probably need to add some way of showing the user which rule was matched for a given connection string.

Well, both of these are:

  1. Also important if we have wildcard rules
  2. Are not a problem for CLI (occasionally crashing on a different password when running manually, we don't expose it to server anyway; and it's fine to print rules in CLI, but more complex stuff is ugly to implement in every binding)

Also I'm not sure that "most specific rule" is harder to reason for humans:

  1. There are precedents, e.g. nginx's locations (they are simpler, though)
  2. If something match a wrong rule, increasing its specificity is much clearer way to fix things, than moving entries around (sometimes breaking different entry because of moving).
  3. We can assert on conflicting rules in the case of "most specific match", but much less so on "first/last rule matches"
  4. Naming rules is more straightforward in "most specific rule" case (making them a mapping, see below)
  5. Human mind also matches things when looking at config file, not evaluates top to bottom, until you need to really carefully debug thing. So that "Aha!" moments are there in both options.

We can probably make it a mapping, to make things clearer:

{
  "user_specified_id_1": {"database": "mydb", "host": "localhost",...},
  "hash:0xBEEF01": {"database": "mydb", "host": "localhost",...},
}

Perhaps using hashes or numeric indices in programmatically generated entries to refer to specific entry in error message. This should be easy enough to do in any language bindings. In Rust, we'll probably be able to show line numbers anyway.

CodesInChaos commented 4 years ago

As a user I'd prefer:

  1. The connection string as a url, e.g. edgedb://username:password@host:port/database?option1=value&option2=value2 which would look something like Connection::new(Config::from_url("...")) in code.

    Every language will have a url parser included. The commonly used fields (username, password, host, post) have a concise and standardized way of being expressed. The query parameters enable for extensibility (in the extreme case storing json inside a query parameter).

  2. I think implicitly discovering the connection string is okay for a command line tool, but I'd prefer it to be explicit in a driver library. Something like Connection::new(Config::discover()).
  3. It might make sense to have a mode where the connection string determines the host and username, while the authentication information (e.g. password) is discovered through a different mechanism (e.g. the system password manager, or something similar to an SSH key agent). That way the public connection information is separated from the secrets while allowing the application to connect to different servers with different authentication data.
tailhook commented 1 year ago

I think this was implemented as part of the implementation of the query params in the DSN.