geofffranks / spruce

A BOSH template merge tool
MIT License
431 stars 78 forks source link

Make cf-release work with spruce #51

Closed Amit-PivotalLabs closed 8 years ago

Amit-PivotalLabs commented 8 years ago

I'm exploring whether it's possible to make cf-release work with spruce instead of spiff. For the most part, it's meant I can delete a bunch of some_prop: null all over the place, which is awesome. I'm trying to validate that this works by running the manifest generation tests in cf-release. So far, I've been able to get it working for 1 out of the 4 infrastructures (namely vsphere):

https://github.com/Amit-PivotalLabs/cf-release/tree/wip-spiff-to-spruce

Here's output from the test run:

$ bundle exec rspec spec
6 error(s) detected:
 - $.jobs.api_worker_z1.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_worker_z2.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_z1.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_z2.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.meta.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects

F6 error(s) detected:
 - $.jobs.api_worker_z1.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_worker_z2.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_z1.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.jobs.api_z2.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.meta.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects
 - $.properties.nfs_server.address: Unable to resolve `jobs.nfs_z1.networks.cf1.static_ips.[0]`: `jobs.nfs_z1.networks.cf1.static_ips` has no sub-objects

F7 error(s) detected:
 - $.jobs.api_worker_z1.properties.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.jobs.api_worker_z2.properties.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.jobs.api_z1.properties.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.jobs.api_z2.properties.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.meta.nfs_client_ranges.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.meta.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure
 - $.properties.nfs_server.allow_from_entries.[1]: Unable to resolve `networks.cf2.subnets.[0].range`: `networks.cf2` could not be found in the YAML datastructure

F.

Failures:

  1) Manifest Generation aws behaves like generating manifests builds the correct manifest for aws
     Failure/Error: expect($?.exitstatus).to eq(0)

       expected: 0
            got: 2

       (compared using ==)
     Shared Example Group: "generating manifests" called from ./spec/manifest_generation_spec.rb:24
     # ./spec/manifest_generation_spec.rb:12:in `block (3 levels) in <top (required)>'

  2) Manifest Generation warden behaves like generating manifests builds the correct manifest for warden
     Failure/Error: expect($?.exitstatus).to eq(0)

       expected: 0
            got: 2

       (compared using ==)
     Shared Example Group: "generating manifests" called from ./spec/manifest_generation_spec.rb:28
     # ./spec/manifest_generation_spec.rb:12:in `block (3 levels) in <top (required)>'

  3) Manifest Generation openstack behaves like generating manifests builds the correct manifest for openstack
     Failure/Error: expect($?.exitstatus).to eq(0)

       expected: 0
            got: 2

       (compared using ==)
     Shared Example Group: "generating manifests" called from ./spec/manifest_generation_spec.rb:32
     # ./spec/manifest_generation_spec.rb:12:in `block (3 levels) in <top (required)>'

Finished in 1.35 seconds (files took 0.14466 seconds to load)
4 examples, 3 failures

Failed examples:

rspec ./spec/manifest_generation_spec.rb[1:1:1:1] # Manifest Generation aws behaves like generating manifests builds the correct manifest for aws
rspec ./spec/manifest_generation_spec.rb[1:2:1:1] # Manifest Generation warden behaves like generating manifests builds the correct manifest for warden
rspec ./spec/manifest_generation_spec.rb[1:3:1:1] # Manifest Generation openstack behaves like generating manifests builds the correct manifest for openstack

The failures stem from the following facts:

Any thoughts for good ideas on how to proceed?

geofffranks commented 8 years ago

I can see the case for a grab_if_exists operator, or something with perhaps a shorter name. However until that gets implemented you can have cf-jobs.yml define jobs.nfs_z1.networks.cf1.static_ips as ~ or null, to provide the null default. Then if you actually define static IPs in cf-infrastructure-.yml, they would be pulled rather than the null value. .

For openstack + cf2 network, you could follow the same approach, or simply redefine all of the things referencing cf2 in the cf-infra-openstack.yml to not pull cf2 data. It's not ideal, but will get things working until we get a new operator added in for this.

geofffranks commented 8 years ago

As an aside, it's awesome to hear that you're investigating this! Thank you!

amitkgupta commented 8 years ago

grab_if_exists: geofffranks/spruce#52

allomov commented 8 years ago

I am very exited to see movements in this direction. @amitkgupta thank you.

jhunt commented 8 years ago

With the newly-merged-to-master changes to operator implementation, we are looking at building a small expression engine for operands, which would support the following scenarios:

simple:   (( grab meta.this ))
or_nil:   (( grab meta.this || nil ))
tilde:    (( grab meta.this || ~ ))
alts:     (( grab meta.this || meta.other.that ))
literals: (( grab meta.this || meta.that || "the other" ))
mtekel commented 8 years ago

+1 looks like this is the last thing preventing spruce to replace spiff...

Jonty commented 8 years ago

Has there been any progress on implementing the || operator, or any design documentation? We may be able to help with the development.

jhunt commented 8 years ago

There has, but it requires some changes to the way (( inject ... )) works. Design documentation is (for now) the comment from me on Nov 10: https://github.com/geofffranks/spruce/issues/51#issuecomment-155571701.

Feel free to put in a PR, though, if you've got working code! :smile:

Jonty commented 8 years ago

We've not dared touch the code yet for fear that you've got incompatible plans! When you're ready we're happy to help.

jhunt commented 8 years ago

Roadmap right now is to take (( inject ... )) out of the operator-eval phase and make it an interim phase between document-merge and operator-eval. That way data flow analysis can proceed on a fully-injected YAML tree, and some of the non-deterministic behavior we've seen should go away.

Otherwise, you get really convoluted data flow graphs for stuff like this:

meta:
  templates:
    job:
      properties:
        broker: http://test.example.com:8080
        username: admin
        password: (( grab meta.credentials.password || "default-password" ))

jobs:
  - name: ajob
    instances: 1
    <<<: (( inject meta.templates.job ))
    properties:
      other-broker:
        password: (( grab jobs.ajob.properties.broker.password || nil ))
geofffranks commented 8 years ago

With @filefrog'S || operator support, can you guys play around and see how this works out for you? Hopefully I will polish up some more testing on our end and cut a release in the next week or so.

timmow commented 8 years ago

We had a play with using the || operator support in this branch. We created a simple cf-stub.yml for testing and ran the command scripts/generate_deployment_manifest aws cf-stub.yml. At first we ran into #72, but this was fixed while we were looking into it! There still seems to be an issue with the output and the nfs section - the static ips part doesnt seem to be being dereferenced.

- instances: 0
  name: nfs_z1
  networks:
  - name: cf1
    static_ips: (( jobs.nfs_z1.networks.cf1.static_ips ))
  persistent_disk: 102400
  properties:
    metron_agent:
      zone: z1
  resource_pool: medium_z1
  templates:
  - name: debian_nfs_server
    release: jimtest
  - name: metron_agent
    release: jimtest
  update: {}
jhunt commented 8 years ago

I think you need a `grab' keyword in there...

...
  networks:
  - name: cf1
    static_ips: (( jobs.nfs_z1.networks.cf1.static_ips ))
...

should really be

...
  networks:
  - name: cf1
    static_ips: (( grab jobs.nfs_z1.networks.cf1.static_ips ))
...
jhunt commented 8 years ago

That being said, if this is specifically what you are talking about: https://github.com/alphagov/cf-release/blob/spruce-or-operator-test/templates/cf-jobs.yml#L275

I fear you're going to run into a data flow cycle, since it looks like your trying to grab jobs.nfs_z1.networks.cf1.static_ips from jobs.nfs_z1.networks.cf1.static_ips

geofffranks commented 8 years ago

I think instead you want (( static_ips(x, y, z) )) there?

On Dec 3, 2015, at 8:40 AM, James Hunt notifications@github.com wrote:

That being said, if this is specifically what you are talking about: https://github.com/alphagov/cf-release/blob/spruce-or-operator-test/templates/cf-jobs.yml#L275 https://github.com/alphagov/cf-release/blob/spruce-or-operator-test/templates/cf-jobs.yml#L275 I fear you're going to run into a data flow cycle, since it looks like your trying to grab jobs.nfs_z1.networks.cf1.static_ips from jobs.nfs_z1.networks.cf1.static_ips

— Reply to this email directly or view it on GitHub https://github.com/geofffranks/spruce/issues/51#issuecomment-161642458.

timmow commented 8 years ago

We've been experimenting a bit more in alphagov/cf-terraform#93 - we've got a version of the cf-release manifests from 225 compiling with spruce, resulting in the same output as running a spiff merge on them. We've renamed the files for various reasons, but it will hopefully be useful. The individual commits give more context on the changes we needed to make to go from spiff to spruce.

jhunt commented 8 years ago

Does this mean that we can close this issue?

amitkgupta commented 8 years ago

@filefrog fine by me.

In case anyone's curious, I haven't continued looking into spruce, for a couple reasons.

  1. Longer term, I'd like to move away from spruce/spiff/any complicated logic for constructing manifests, and would like BOSH to be able to directly consume simpler data.
  2. In the medium term, I'm trying to find a way to force the spec stubs in cf-release to be more realistic. Specifically, I want some way of knowing that if a user takes one of those stubs and changes the values, they have a good chance of working; currently, some of them are guaranteed to fail. To this end, my plan would be to set up some tests that involved the stubs in cf-release vs the stubs used for the core cf-release integration pipelines. I'm not sure exactly how this will work yet, but my preliminary ideas would be harder to implement with spruce vs spiff. E.g. if I diffed spiff merge spec-stub.yml real-stub.yml vs real-stub.yml I could assert that difference between the two is only a whitelisted set of extra values in the real-stub.yml (e.g. uaa clients), and I would know that everything else in real-stub.yml was also structurally in spec-stub.yml. With spruce, because it's more permissive in how it adds things to mappings, I can't use this technique.

There's probably some better way to do what I'm trying to do in 2. When I get back around to thinking about 2, if I think of something better, I will check back in with where spruce is.

geofffranks commented 8 years ago

I'm not sure I understand the reasoning behind reason 2. Why provide the parameters at all if you want to force the users to not change them from what the CF release says they should be? It would be easier to enforce that in the bosh templates by hardcoding the values you want to never change, rather than rely on a template-merging tool. If there are values that have legitimate reasons to be changed for snowflake installations, why not make it easy for users to change them and get CF up and working quickly?

amitkgupta commented 8 years ago

The parameters in the spec stubs are typically things that are required, but we do want the user to change, e.g. BULK_API_PASSWORD.

geofffranks commented 8 years ago

Oh, so you want to force a user to override a value to ensure things work? (( param "short explanation of what this param does and why its needed" )) provides capability for that

On Dec 4, 2015, at 10:57 AM, Amit Gupta notifications@github.com wrote:

The parameters in the spec stubs are typically things that are required, but we do want the user to change, e.g. BULK_API_PASSWORD.

— Reply to this email directly or view it on GitHub https://github.com/geofffranks/spruce/issues/51#issuecomment-162004219.

geofffranks commented 8 years ago

Oh, so you want to force a user to override a value to ensure things work? (( param "short explanation of what this param does and why its needed" )) provides capability for that

On Dec 4, 2015, at 10:57 AM, Amit Gupta <notifications@github.com mailto:notifications@github.com> wrote:

The parameters in the spec stubs are typically things that are required, but we do want the user to change, e.g. BULK_API_PASSWORD.

— Reply to this email directly or view it on GitHub https://github.com/geofffranks/spruce/issues/51#issuecomment-162004219.

Amit-PivotalLabs commented 8 years ago

We do want to show users values to replace themselves, but that's not what I'm getting at in point 2.

The cf-stub.yml files in https://github.com/cloudfoundry/cf-release/tree/master/spec/fixtures serve two purposes:

(a) They are automatically slurped into the public docs (e.g. http://docs.cloudfoundry.org/deploying/ec2/cf-stub-aws.html) for users to know how to make a stub to generate a manifest. (b) They are used in the cf-release unit tests to know that our manifest generation scripts are not broken.

It's somewhat important that the stub is used for both purposes, because it essentially ensures that the example stubs used in the documentation have tests that prove they "work".

The sticky point is "work". The existing stubs work in that they go through the manifest generation script fine, without spiff blowing up. But some of them have no chance of being a working deployment, no matter what values a user puts in.

One example of something that could cause this is the consul encryption configuration. You can either specify require_ssl: false, or you can provide a bunch of keys and certs. Since you can do one or hte other, the templates can't enforce that you have either one, so they will still generate manifests fine if you provide nothing. The real stubs that we use had to add these properties, but there was no failing test driving anyone to add example values to the stubs too. So my goal in 2 is to somehow ensure that the example stubs are "similar" to the real stubs we use, and if that fails, that should drive someone to add the necessary things to the example stubs.