concourse / concourse

Concourse is a container-based continuous thing-doer written in Go.
https://concourse-ci.org
Apache License 2.0
7.42k stars 848 forks source link

Proposal for new params behavior #545

Closed vito closed 7 years ago

vito commented 8 years ago

tl;dr: deprecate {{these}} in favor of ((these)), and make them understand YAML.

Fixes #118, #299, #360, https://github.com/concourse/s3-resource/issues/41, https://github.com/concourse/fly/issues/53, https://github.com/concourse/fly/issues/83, and sets the stage for #291.

Today's {{params}} are very barebones. They were added with safety and conservatism in mind, as they were originally designed for propagating credentials (e.g. private keys) into a pipeline, and we didn't want to invent another crazy templating language.

The proposal for tomorrow is to invent another crazy templating language. But, like, this time we'll get it right. The only thing this language will support is pulling in values from some other structure. (Though I expect people will continuously ask for new features, e.g. defaults, which is a little worrying...)

The first step is to switch to ((parens)), so that we can parse the YAML first ({{these}} are invalid). This fixes #118 as the YAML parsing would skip any comments.

This interpolation would follow YAML semantics, not just text substitution. This means false would stay false, not become "false" (https://github.com/concourse/s3-resource/issues/41), and you could interpolate structures like arrays and hashes (#299).

We would handle foo((bar))baz by pulling out the bar value as normal, ensuring it's a string, and adding "foo" and "baz" around it, fixing #360. If the value is not a string, we'll fail fast, since we don't want to marshal it (YAML is tricky: yes and on both turn into true, so we can't just marshal).

We would start implementing this as a generic library so that it can be reused with BOSH, which has pretty much the exact same set of goals coming up.

The interpreting of the template handling would be done entirely in the ATC. Fly would just submit the YAML and the params, and ATC would handle it. This way adding things like Vault (#291) is just like adding another source of params for ATC to interpolate in the same pass. So some params would come from fly and some would come from a config server, but they look the same. The goal here is to keep pipeline templates decoupled from the config store that you have their params come from: you could use Vault or just stick with local params.

This would all be backwards-compatible; we'll continue to process {{these}} ahead of time, and likely start issuing warnings, but it's trivial to support both.

Example

pipeline.yml:

resources:
- name: my-repo
  source:
    uri: git@github.com:concourse/concourse.git
    private_key: (( secret.concourse_repo.private_key ))

- name: env-state
  source:
    bucket: (( env ))-ci
    key: state

jobs:
- name: do-some-stuff
  plan:
  - get: my-repo
  - task: build-thing-on-env
    tags: (( env-tags ))

params.yml:

secret:
  concourse_repo:
    private_key: |
      ---BEGIN SUPER SECRET STUFF---
      AJHGSHKFSDHJCBSHCJBSGJKDBJSD
      ...
      ---END SUPER SECRET STUFF---

env: staging

env-tags: ["speedy"]

With fly set-pipeline -c pipeline.yml -l params.yml, this would resolve to:

resources:
- name: my-repo
  source:
    uri: git@github.com:concourse/concourse.git
    private_key: |
      ---BEGIN SUPER SECRET STUFF---
      AJHGSHKFSDHJCBSHCJBSGJKDBJSD
      ...
      ---END SUPER SECRET STUFF---

- name: env-state
  source:
    bucket: staging-ci
    key: state

jobs:
- name: do-some-stuff
  plan:
  - get: my-repo
  - task: build-thing-on-env
    tags: ["speedy"]

And, with Vault support in #291, the secret.concourse_repo.private_key could instead come from a key in Vault. That is, foo.bar.baz would resolve to the baz attribute of the foo/bar key.

vito commented 8 years ago

One concern: with Vault each backend is explicitly mounted at a certain level in the path. This means that keys reaching into Vault would tend to look like <some backend mount>.foo.bar.baz, which kind of leaks the abstraction and means people reusing the template would have to mimic Vault's hierarchy. In the above example this is shown as secret.concourse_repo.private_key. Maybe that's OK? Maybe the mounts will be named generically enough that it doesn't feel weird? (e.g. aws.)

/cc @cppforlife

concourse-bot commented 8 years ago

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

dcarley commented 8 years ago

Suits our use cases 👍

robdimsdale commented 8 years ago

What happens if foo:(( some-var )) is not supplied? Will this result in foo: nil or a failure in template rendering?

Other than that, this looks fine for me

vito commented 8 years ago

@robdimsdale Up for discussion. I'm tempted to have it leave the key unspecified. Then there'd be no future ask for optional params, and things would still error if it was required.

Same for foo-((bar))?

Alternatively, as with today's behavior it could error if any are unspecified.

On Wed, Jul 20, 2016, 7:14 AM Rob Dimsdale-Zucker notifications@github.com wrote:

What happens if foo:(( some-var )) is not supplied? Will this result in foo: nil or a failure in template rendering?

Other than that, this looks fine for me

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/concourse/concourse/issues/545#issuecomment-233961313, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHWLQuJLxUpkOjvdPYOTfOct3Pm-Vbks5qXizdgaJpZM4JQMok .

zachgersh commented 8 years ago

@vito on the vault abstraction - if the ATC is going to be making the calls to vault to get the values why not have the user specify no backend.

The ATC could just discover what secret backend the user has specified by listing mounts. You end up with a less leaky abstraction and get to specify something like this

Any thoughts on whether fly could automagically convert your {{}} to (()) would probably make the transition even better for people. Still issue a warning so that people change them at some point.

vito commented 8 years ago

@zachgersh After the backend is mounted it's just like any other key - that's the Vault abstraction I'd like to preserve.

Fly could probably auto-convert {{}} to (()).

keymon commented 8 years ago

@vito when will (( my_mount_thing.a.value )) be evaluated?

When rendering and storing the pipeline in ATC, or every time the resource/task is executed?

I understand it will be every time the resource/task is executed, as this is important in the case of dynamic backends.

vito commented 8 years ago

@keymon Until we do #291, that would only mean local config, which could just be rendered and stored in the DB, same as today. (We could keep the params and dynamically evaluate them, but there's not much point since they're static).

Later, we'll want to defer it until resource/task execution.

cjcjameson commented 8 years ago

Our use case, prompted from #360: a long-established product for which Concourse is the 3rd generation CI system. So we're adapting existing makefiles etc. and building up concourse around and into its build system, but we've got constraints. The makefiles generate specific, brittle filenames, and our customers expect those same consistent, detailed filenames. So our CI system in the middle is most natural if we maintain those filenames. This defies the Concourse idiom which prefers de-referencing to a generic, abstract filename in the middle and reconstructing the detailed, versioned filename at the end.

We want to interpolate these filenames into our pipeline, as a step on the way to refactoring towards a more ideal structure for the filename. We want to highlight duplication between the Concourse pipeline file and makefiles' hardcoded configuration, so that we can recruit other teams to work with us on building it better. We also want to DRY up the repetition of the brittle filenames within the pipeline.yml.

We will attempt the suggestion from #360 to have the whole filepath as the interpolated variable.

cjcjameson commented 8 years ago

Update on our use case: our pain around brittle filenames at the entrypoints to the pipeline is slightly reduced by using the S3 resource's features around arbitrary filenames and glob expansion/matching: https://github.com/concourse/s3-resource#parameters-1

acca301 commented 8 years ago

Just 2 cents on syntax for #126743511 - Ansible guys do that with {{ syntax without introducing ((.

Could be nice to have the same syntax everywhere.

vito commented 8 years ago

@acca301 we currently use {{}} syntax, the trouble is it's invalid YAML which makes resolution difficult as it'll take multiple passes/traversals. i suppose we could normalize it as a first pass, but given that we don't expect to fully match ansible's behavior with these templates it doesn't really seem worth the trouble (could just make it more confusing not having the different syntax help with the context-switch).

acca301 commented 8 years ago

@vito - I would just cite: http://docs.ansible.com/ansible/YAMLSyntax.html#gotchas Note the comment on other special characters.

Quoted strings can be expanded/normalized on the first pass as you [supposed to] know everything you need when you hit a quote.

There is a strong theory behind such problem avoidance, it is called "Gödel's incompleteness theorems", we can see it now in action.

aquavitae commented 8 years ago

@acca301, @vito Also worth noting that Ansible does it by using jinja2 templating, i.e. using an existing, and very powerful, templating language instead of creating a new one.

ggeorgiev commented 8 years ago

one issue I have with using jinja2 is if a value has more than one line - all error reports under are shifted. If using jinja, yml error report normalization might be needed.

Note sure if this is true with the internal replacement mechanism right now, but I certainly hope so.

aquavitae commented 8 years ago

Yes, that is a problem, but ansible seems to get around it by doing a kind of two pass templating. So, if I understand it correctly, it resolves things that change the structure (e.g. conditionals and loops) first, parses the yaml, then fills in templated values. Don't sure if this is quite right, but in any case my point is that there are already quite powerful templating languages out there which could be used to some extent, even if they need to be heavily customised.

I'm using jinja2 at the moment for building pipelines because I need more that the existing templating gives. I'm using a pipeline template to build multiple pipelines based on project-specific config. For me, a complete templating language syntax for concourse would at minimum also need to include loops (i.e. for x in y). Useful extras would be conditionals and the ability to set variables (see http://jinja.pocoo.org/docs/dev/templates/#assignments)

ggeorgiev commented 8 years ago

@aquavitae I agree using complex templatization might be useful. The error reporting at concourse level will get even more complex for some of these constructions. On the other side if something like this is needed this does not seem as a configuration anymore, but more of a scripting against API. I personally like the simplicity of configuration files.

wernerb commented 8 years ago

I strongly recommend the following:

  1. Use an existing template engine
  2. Do not use (( )) syntax, it is really alien looking to me. Also people use parentheses in their builds and might be greatly confused.
  3. The point that {{ }} doesn't work for yaml is weird because it has been working for Ansible for years. What you can do is enforce that:
blabla: {{ test }}

Must be:

blabla: "{{ test }}"

This can be done very easily by yaml validating the input. Note that if you add other characters besides only the interpolated value that you do not have to use double quotes (though I do recommend it)

vito commented 8 years ago

I think we're getting a bit off-track. I totally understand the desire to use an existing templating language, but we're not trying to implement templating. I don't even think we should try to pick one - you should be able to use whatever tool you want to generate your config.

We're trying to land on a scheme that allows dynamic fetching of actual values (strings, bools, ints, arrays, hashes, ...) from sources like Vault or a static "stub" file (today's credentials.yml pattern). This isn't just interpolation, and there's no need for loops or conditionals at this level. That's not the problem we're trying to solve.

It also doesn't seem feasible to tackle at this level with existing template systems. None of this is interpreted upon the config being submitted - the whole point is to defer credential fetching until we need it, for rotation policies and such - so this wouldn't support e.g. having a loop that generates jobs or steps in a build plan. That's something you'd have to figure out yourself - use jinja if you like.

Re: syntax, I think it'd be really confusing to adopt syntax that looks like existing template engines but behaves entirely differently. And having to type foo: "{{ bar }}" just gets even more confusing when the value bar refers to may be an integer, boolean, array, etc. - while that notation implies that it's a string.

stepanstipl commented 7 years ago

I'm curious, any estimate when this might land? Anything you want help with? We're really interested in the store secrets in Vault use case.

Kevin-Fries-Altoros commented 7 years ago

This entire conversation is a bit confusing to me. Concourse should fetch data from Consol or Vault as one resource, and these values should be passed to any one of several template engines in a job. I see it this way:

Resource 1 => Your Data Source (credentials.yml, vault, etc) Resource 2 => Your Template Engine (Mustache, Jinja2, etc) Resource 3 => Your Template Files (Your Jinja2, Mustache, etc files) (may be more than one resource) Resource 4 => Your output directory Task (Render)(out) => A task that Reads templates from res3 with data from res1 passing them through res2 and sending the output to res4

Whether you need simple credentials late bound, or you need more complex looping, this one method solves both. Why reinvent the wheel to cut down on features. I get to add additional features, but why would you not just use the open source widely understood tools that already exists. Again, this conversation confuses me.

vito commented 7 years ago

@Kevin-Fries-Altoros I'm not sure what you mean by resource or template engine and your example doesn't really clarify things to me.

The fundamental problem to solve is storing plaintext credentials in the database. This will require adding logic to the ATC to resolve things to their values via the credential source. We've also already explained why none of the existing tooling is sufficient, templating-wise, as they focus on the wrong objective (context-free text generation). We're not trying to solve a templating problem.

@stepanstipl We've knee deep in #629 so we haven't been able to get to this. :/ It'll be the next objective but if you want to take a look at our thoughts on this you can check out the stories in the following epic: https://www.pivotaltracker.com/n/projects/1059262/epics/2859437/expand

ggeorgiev commented 7 years ago

I am 100% for not using {{ }}. This will force if you use templates to have to escape setting the security keys.

On the other side having this in the config might be avoided completely, if concourse simply follows some key protocol how to obtain them. Say if I would like to provide a private key for a pipeline foo, resource bar I simply have to set the appropriate key in the database: foo/bar/private_key with the reference to the private_key.

This arguably will reduce the security concerns removing the need to point out the reference in the config. It needs to be flexible enough, though - offering wildcards etc - else might become annoying for very dynamic setup.

stepanstipl commented 7 years ago

@vito thanks, I think I have seen most of these and pretty much agree with everything. Some thoughts:

One thing where I'm not completely clear is how generic is this supposed to be - is the idea to eventually support any compatible backend (not just Vault)? Like etcd, Consul etc., which would imply some configuration for each individual backend? Such as:

backend_a:
  type: vault
  url: https://myvault:8200/v1/secret_backend_mount
  token: xyz
backend_b:
  type: etcd
  url: http://127.0.0.1:4001

and then reference these as (( backend_a.path.to.secret.key ))? Vault for example has lot of different backends and not all of them are simple GET to /path/to/secret and return array of json keys.

One thing I would relly like to see supported is retrieving PKI certificates - which is one of Vault's backend types. Unfortunately it's slightly more complex as retrieving a certificate is POST to /pki/issue/<role name> path and you need to send some payload containing at least requested common name.

One way how I can see this happen is scenario where you would have special vault_pki type of backend and then the requesting secret in form (( backend_name.issue.role_name.common_name )) would translate to POST to /backend_mount/issue/role_name with payload of { common_name: "common_name" }, but it feels a bit limited and hacky and I'm not sure that it would cover all potential use cases with other backend types.

Other possibility - perhaps there is a need for more flexible form of secret variable name for cases where dot separated name notation (( aa.bb.cc )) is not enough. Worth checking out imho are https://github.com/hashicorp/consul-template and https://github.com/kelseyhightower/confd. For example consul-template deals with this by supporting form secret "pki/issue/my-domain-dot-com" "common_name=foo.example.com" where after secret name you can pass additional parameters.

Also when would these params get interpolated - when you set the pipeline or at execution time?

I was thinking about use cases like using variable in resource name on one side and dynamic variables which you would want to retrieve on each individual run (ie. when password stored in Vault gets updated) on the other.

eedwardsdisco commented 7 years ago

@vito now that https://github.com/concourse/concourse/issues/629 appears to have landed, is this still on the horizon?

+1 from me... My use case is being able to dynamically provide values like ECR repo account IDs for the docker-image resource's repository value

vito commented 7 years ago

yep, this is coming up next once the work in #629 has stabilized. we're going to do some soak testing for a while.

On Tue, Mar 7, 2017, 11:20 PM eedwardsdisco notifications@github.com wrote:

@vito https://github.com/vito now that #629 https://github.com/concourse/concourse/issues/629 appears to have landed, is this still on the horizon?

+1 from me... My use case is being able to dynamically provide values like ECR repo account IDs for the docker-image resource's repository value

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/concourse/concourse/issues/545#issuecomment-284942751, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHWP7gJFrPu-yNLJRkUldoFEIezD7aks5rjix-gaJpZM4JQMok .

chendrix commented 7 years ago

needed-by #291

mariash commented 7 years ago

Some atc work was pushed to yaml20 branch https://github.com/concourse/atc/tree/yaml20

clarafu commented 7 years ago

The evaluating of the template variables from "load-from-file" was going to happen on the server side but then we realized that the diff that happens in fly between the new and existing config would need to have an interpreted new config. So now, the evaluating will happen in fly and then send the yaml marshalled config to the atc.

We also decided to keep the evaluating of the "{{}}" in order to have it backwards-compatible and just evaluate the "(())" after and leave a warning message.

avirkar commented 7 years ago

Will it be possible to merge two keys for eg. PCF_APP_NAME: ((cf-api-app1))-((env))

vito commented 7 years ago

Yes

On Mon, Aug 14, 2017, 5:42 PM avirkar notifications@github.com wrote:

Will it be possible to merge two keys for eg. PCF_APP_NAME: ((cf-api-app1))-((env))

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/concourse/concourse/issues/545#issuecomment-322317942, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHWNIo1IrP53QO9VlreY205Zua6kg_ks5sYL9GgaJpZM4JQMok .

avirkar commented 7 years ago

Do you have an sample of how that can be achieved, my var file has key value pairs like cf-api-app: api- cf-dispatcher-app: dispatcher- cf-space: test

In my pipeline I want to pass param to my task which should look something like

PCF_API_APP_NAME: ((cf-api-app))-((cf-space)) PCF_DISPATCHER_APP_NAME: ((cf-dispatcher-app))-((cf-space))

So my task can get a param based on environment, and it would look something like PCF_API_APP_NAME: api-test PCF_DISPATCHER_APP_NAME: dispatcher-test

But I guess somehow, it does not take the combination as expected and is unable to evaluate it. I even tried using anchor in my config.yml but still it does not work.

aliases: &environ test

cf-api-app: api-environ (fails) cf-dispatcher-app: dispatcher-environ (fails) cf-space: *environ (works)

endemics commented 7 years ago

@avirkar you should consider opening another issue as this issue has been closed when the feature was implemented.

Also when creating a bug report, try to provide the simplest configuration options that allow to reproduce it. In this case, your example is not enough.

For instance, I have configurations using multiple interpolation successfully, so I can tell that the feature works as expected, at least "on my machine".

avirkar commented 7 years ago

@endemics Submitted a new issue. https://github.com/concourse/concourse/issues/1471