Support multi-value dynamic credentials

michael2m commented 3 years ago

Problem

Perceiving secrets as just single values, like passwords or API keys, is too limited. Often credentials com in pairs or multiple related values in general. This holds e.g. for AWS (access key and secret key), Azure (client id and client secret), Postgres roles (username/role and password). Those related values may be generated by a provider. This makes credentials fully dynamic. They may change upon every new request, e.g. every new connection to Postgres may use a fresh set of credentials (with certain time to live). Unfortunately it causes trouble when a provider is used to first get e.g. a username and then a password, because that may generate two sets of credentials of which the username comes from the first and the password from the latter, yet together form no valid pair.

Solution

Suggestions ?

Alternatives

This is not an issue for simple providers like literal, environment and file. Nor will it be an issue if only 1 among related values is dynamic (e.g. only password). No alternatives seem to be supported or easily implemented.

Additional context

Dynamic credentials e.g. in Vault appear in:

Azure: https://www.vaultproject.io/docs/secrets/azure
AWS: https://www.vaultproject.io/docs/secrets/aws
Database:
- PostgreSQL: https://www.vaultproject.io/docs/secrets/databases/postgresql
- MySQL / MariaDB: https://www.vaultproject.io/docs/secrets/databases/mysql-maria

diverdane commented 3 years ago

Hi @michael2m,

Thank you for filing this enhancement request for Secretless Broker. I'd like to understand your suggestion better.

Can you tell me which credentials provider that you're using with the Secretless Broker? If it's the Hashicorp Vault credentials provider, then that provider should support multiple fields within a secret. For example, you should be able to include a section in the secretless.yml configuration file that looks similar to the following:

      username:
        from: vault
        get: postgres/creds#username
      password:
        from: vault
        get: postgres/creds#password

Would this be sufficient for supporting multi-value dynamic credentials, or is there more required to support what you're looking for?

michael2m commented 3 years ago

@diverdane this is not sufficient. In Vault when a dynamic secret, e.g. for Postgres, is obtained you get both username/role and password. Every call generates a new username and password combination. The above would effectively create the dynamic credentials for Postgres twice, once in the username's get and once in the password's get. Yet it would take only the username from the first dynamic credentials and the password from the second. They would not form a matching pair.

In other words, every get would trigger a newly created set of credentials. My request / problem is that I would like to update multiple values (e.g. both username and password) through a single get.

diverdane commented 3 years ago

@michael2m thanks for the explanation, I understand your point! This may require a change in the secrets.yml syntax... using the example above, maybe something like this:

creds:
    from: vault
    get: postgres/creds
    fields:
        - secret: username
           field: username
        - secret: password
           field: password

or a little more explicit?:

creds:
    from: vault
    get: postgres/creds
    fields:
        - secret: username
           field: creds#username
        - secret: password
           field: creds#password

An alternative would be leaving the syntax the same, and locally (and implicitly) caching the value of the complex secret (e.g. postgres/creds) when the secrets.yml is parsed and secrets are retrieved. We may want to do this anyway (even if we're chaning syntax as described above) in order to maintain backwards compatibility.

Any suggestions/improvements on the syntax?

izgeri commented 3 years ago

this is an interesting use case, and I think we may want to reconsider how we resolve secret values in Secretless. there can be value in sending each provider in a service definition the set of variables it will be expected to resolve for the connection, rather than sending each credential key one by one.

that is, I am proposing we consider revising the resolver definition: https://github.com/cyberark/secretless-broker/blob/b3c42e3c534c15f9110e08c47ee7785da9597c4d/internal/plugin/resolver.go#L93-L102

so that instead of getting each value one by one, we implement some logic to group the credentials for each service by provider and send the provider the set of keys it will need to resolve for the connection.

for the vault provider, this would mean getting the reference to the username/password in the same request, so that they can be dynamically retrieved and returned as a pair. for the conjur provider, this would mean being able to use batch retrieval to retrieve a set of secret values at once, and limit the number of required requests (and potentially eventually retrieving a safe or other comparable policy object that groups together connected variables)

that probably makes sense in a separate issue from this one, and this issue would depend on that issue's resolution. I'd love some feedback from @cyberark/community-and-integrations-team and @jonahx before moving forward with this idea, including on its feasibility - there may be some aspects of the code that I'm missing that make this more difficult than I think it is.

doodlesbykumbi commented 3 years ago

Completely agree with @izgeri on this. We just need to change the provider's method definition from:

// GetValue takes in an id of a variable and returns its resolved value
GetValue(id string) ([]byte, error)

To:

// GetValues takes in variable ids and returns their resolved values
GetValues(ids ...string) ([][]byte, error)

Everything from before stays more or less the same while giving providers flexibility to implement batch retrieval mechanisms.

doodlesbykumbi commented 3 years ago

I thought this was straightforward enough that I created a draft PR that implements/captures the idea and is green ✅ , see https://github.com/cyberark/secretless-broker/pull/1344/.

michael2m commented 3 years ago

@doodlesbykumbi have you considered:

// GetValues takes in variable ids and returns their resolved values
GetValues(ids ...string) (map[string][]byte, error)

using a map rather than a slice of values ?

sgnn7 commented 3 years ago

@michael2m That's a great point and interestingly enough, we have discussed this yesterday internally and I think we were leaning towards a richer return value. I think ideally the response instead of being maybe a simple map would instead be a rich type for it to be able to show individual retrieval problems since that will get lost in both original and your proposal.

So maybe something like::

type ProviderResponse struct {
    ID string
    Value []byte
    Error error
}

...

GetValues(ids ...string) ([]ProviderResponse, error)

I don't think we've reached a clear direction though so all input is welcome!

michael2m commented 3 years ago

Totally agree, richer type is definitely the way to go forward. Just wondering about the following. Suppose I take the Vault database secret engine e.g. for Postgres as an example (see Vault - PostgreSQL database). To get the dynamic username and password, I have to get the secret stored in Vault at e.g. database/creds/my-role and in response from Vault get an object:

{
  "username": "some-random-username"
  "password": "some-random-password"
  ...
}

In terms of Secretless, I have 1 ID (namely the path to the secret in Vault) and 2 values (namely username and password). The above mentioned suggested solutions focus on batching IDs. However I am concerned about the case where a single ID gets multiple related values. I can't quite see how to resolve ... I think the suggested syntax/config of @diverdane more closely matches my point.

doodlesbykumbi commented 3 years ago

I'll update https://github.com/cyberark/secretless-broker/pull/1344 to use the rich type.

doodlesbykumbi commented 3 years ago

However I am concerned about the case where a single ID gets multiple related values.

Good point. I like the fields syntax that @diverdane suggested.

For added flexibility I wonder if we might entertain separating secret and credentials definition.

secrets:
  dbEndpoint:
    from: vault
    get: /path/to/endpoint
  dbUserCreds:
    from: vault
    get: /path/to/specific/user/creds

services:
  ecomm-db:
    connector: mongodb
    listenOn: tcp://0.0.0.0:6175
    credentials:
      connString: mongodb+srv://{{ .dbUserCreds.username}}:{{ .dbUserCreds.password }}@{{ .dbEndpoint }}
      authURL: http://{{ .dbEndpoint }}/xyz

Bit of a contrived example but I think this approach might solve a few issues.

We currently force users to store secrets in the form that they need to be used by Secretless. This would give users a lot of flexibility in how they use the secrets in the creds.
If a secret id is used multiple times (in templates) we can avoid refetching for each invocation of a service endpoint.
Secret fetching can be a bit smarter, e.g. set a TTL etc.
We can be more explicit about for example needing to use 2 dynamic creds of the same type. If a dynamic cred appears in two separate secret definitions that means each time a set of credentials are resolved there will be 2 separate fetches. While templates ensure that a secret definition used N time in a template need only be resolved once
...

michael2m commented 3 years ago

@doodlesbykumbi although complexity increases (implementation), this is clearly an improvement and better design. It is a bit more challenging to get TTL right (item 2). Currently a secret is fetched and the provider can immediately read fresh ones from the source (e.g. in Vault this could mean getting the new dynamic credentials; OTOH for env/files it doesn't matter). I think TTL handling must be delegated to the provider in that case (in Vault example the TTL is obtained along with the credentials); it would act kind of like a cache.

doodlesbykumbi commented 3 years ago

type ProviderResponse struct {
    ID string
    Value []byte
    Error error
}

...

GetValues(ids ...string) ([]ProviderResponse, error)

It looks like the rich type might have some issues.

The signature above allows an id to show up more than once i.e. when multiple credentials have the same secret id and provider. Which raises some questions:
1. What should the semantics be ? Fetch each instance separately or should we group responses on id...
2. When mapping back from id to credential name, without additional information via the input args or relying on order we are unable to disambiguate a single id that maps to multiple secret names.
Once secrets have been fetched we're always going to have to map back from id to credential name. Perhaps we can use map[string]ProviderResponse, where the key is ID and we get rid of ID on ProviderResponse.
GetValues can return errors on each ProviderResponse and also this other error. I'm not sure what the other error is for. @sgnn7 What did you have in mind ?

doodlesbykumbi commented 3 years ago

It is a bit more challenging to get TTL right (item 2). Currently a secret is fetched and the provider can immediately read fresh ones from the source (e.g. in Vault this could mean getting the new dynamic credentials; OTOH for env/files it doesn't matter). I think TTL handling must be delegated to the provider in that case (in Vault example the TTL is obtained along with the credentials); it would act kind of like a cache.

@michael2m Agreed. There can be global and provider-specific smarts for secret fetching. With TTL, I thought there could be a global smart that dictates how often we fetch new secrets. Perhaps, this might not have been the best example of smarts. Caching seems tricky because it might break something fundamental in Secretless... not keeping secrets/creds in memory longer than the auth handshake.

sgnn7 commented 3 years ago

What should the semantic of this situation be ? Fetch each instance separately or should we group responses on id...

This is behavior that should be mostly provider-specific.

When mapping back from id to credential name, without additional information via the input args or relying on order we are unable to disambiguate a single id that maps to multiple secret names.

From the interface point of view, we don't (and shouldn't) care how the passed in variables are turned to secrets as long as the output is returned with expected entries. I think if we're digging into how it should internally work, deduplication is fine since we can expect the callee to just break on first matching entry. This however does make @michael2m's suggestion of returning main type to be a map that you also mention in #2 maybe a bit more appropriate (I'd still prefer a rich type as the value though) since it would provide an un-ambiguous treatment of results.

GetValues can return errors on each ProviderResponse and also this other error. I'm not sure what the other error is for. @sgnn7 What did you have in mind ?

ProviderResponse error should record errors in retrieval of an individual value
Main error on the interface should be about general issues with the call that affects all variable retrievals (e.g. provider endpoint cannot be reached, misconfigured provider, etc) which would result in the main result value being null/empty. I don't have a clear idea if an individual value retrieval failure should cascade onto the main error though but I'm guessing it should not.

Caching seems tricky because it might break something fundamental in Secretless... not keeping secrets/creds in memory longer than the auth handshake.

Agreed - this is fundamentally against of how we want the product to work at this time. Secretless should not keep anything in memory more than strictly required for a connection to complete. At some future point we might have capabilities to secure/encrypt broker's memory space well enough for it but we don't have adequate security guards around that right now to be safe when caching.

michael2m commented 3 years ago

Putting complexities of TTL aside. Can somehow the suggested configuration of @doodlesbykumbi be supported ? Or minimally the proposed configuration of @diverdane ? That would be a substantial improvement for Secretless and solve the issue of getting multiple related values together (rather than merely batching).

doodlesbykumbi commented 3 years ago

@sgnn7

This is behavior that should be mostly provider-specific.

Giving that flexibility depends on the return collection type, especially if we're keeping the input as ids []string. If we return a map of ids to responses then it means we are effectively grouping the responses for any given id, 2 cred names with the same secret id always get the same value. If we use a slice and maintain the order then it means each provider can decide on the behavior (grouping or unique), which gives us the flexibility we want and ensures that responses for ids can be mapped to credential names without additional coordination.

With the above in mind then we want to go with the slice. Alternatives do become available when we include both ids and credential names in the input. We can then return a map of credential names with rich responses as values. Maybe that's the best way forward!

sgnn7 commented 3 years ago

If we return a map of ids to responses then it means we are effectively grouping the responses for any given id

Correct and I think this is exactly what we want to both remove ambiguity about the results and because the use case where a connector needs two different dynamic secrets from the same ID seems implausible. If a provider gets two of the same variable IDs in the same batch request, it's coming from a single connection attempt and I think the intention should be that we meld them into a single provider request and a single response value from the function, making the map much more favorable of a return type to me. We also get the code benefit that the caller can be much more lean on logic as it no longer needs to explicitly track the order of its inputs.

doodlesbykumbi commented 3 years ago

@michael2m

Putting complexities of TTL aside. Can somehow the suggested configuration of @doodlesbykumbi be supported ? Or minimally the proposed configuration of @diverdane ? That would be a substantial improvement for Secretless and solve the issue of getting multiple related values together (rather than merely batching)

I think we want to hold off on any changes to the Secretless-wide config until we have a better idea of our priorities for the project.

I think for the moment we can support multi-value (dynamic) credentials with a suffix e.g. path/to/secret/prefix#path/to/value/suffix. I'm not certain if # is always the best delimiter, maybe it's something that we'd want to be configurable, maybe through envvars.

credentials:
  username:
    from: vault
    get: postgres/creds#username
  password:
    from: vault
    get: postgres/creds#password

This would work with the batch retrieval implementation in #1344. The vault provider would just need to group ids with the same path/to/secret/prefix prefix and use the singular value of the secret to resolve the values of each path/to/value/suffix.

doodlesbykumbi commented 3 years ago

@sgnn7 I agree so I've gone ahead and implemented the map.

michael2m commented 3 years ago

Interesting, @doodlesbykumbi. Looking forward to #1344. In #1331 I opted for the /path/to/secret#navigate.to.property syntax (required for another reason). I am a little biased by Vault, which returns JSON object for secrets, hence object property traversal is needed. But the part from # onwards is provider specific, I presume. So is grouping by prefix for batching.

doodlesbykumbi commented 3 years ago

@michael2m Just a heads up that https://github.com/cyberark/secretless-broker/pull/1344 has landed as https://github.com/cyberark/secretless-broker/pull/1356.

boazmichaely commented 3 years ago

Published in CyberArk Aha! idea portal

cyberark / secretless-broker