geofffranks / spruce

A BOSH template merge tool
MIT License
431 stars 78 forks source link

Multiple Document Support? #294

Closed mattdodge closed 3 years ago

mattdodge commented 5 years ago

Any current or planned support for multiple documents within YML? I'm not exactly sure how I'd expect this to work when merging (how would you know which document to merge with) but just curious if it's a known no-go. It's a common pattern I've seen to keep similar YML manifests together (k8s deployment and service, for example) so I think it's at least worth addressing.

Example (file.yml):

kind: Deployment
metadata:
  name: myapp
---
kind: Service
metadata:
  name: myapp-svc
$ spruce merge file.yml
kind: Deployment
metadata:
  name: myapp
loksonarius commented 5 years ago

I would be very interested in support for this. Use case: I'm messing around with spruce for K8s YAML docs, and like described in the OP, docs often have multiple YAML maps per single file.

Not saying this would be the ideal implementation, but from my first testing of this, I was expecting something like:

I could see the semantics of merge not really applying to this convoluted definition. It seems like it'd make more sense to have a whole other operator defined or such based on what use cases should be addressed.

geofffranks commented 5 years ago

So one source gets merged to a bunch of documents in a second file, but not more than one target file?

Eg spruce merge source.yml multi-doc.yml

But not spruce merge source.yml multi-doc-1.yml multi-doc-2.yml?

And source.yml is always a single doc?

Sent from my iPhone

On Aug 28, 2019, at 10:33 AM, Dan notifications@github.com wrote:

I would be very interested in support for this. Use case: I'm messing around with spruce for K8s YAML docs, and like described in the OP, docs often have multiple YAML maps per single file.

Not saying this would be the ideal implementation, but from my first testing of this, I was expecting something like:

Merge all single-doc files as is done currently (calling the result of this source-doc) For every multi-doc file, perform a merge op between source-doc and each doc defined in the file All multi-doc files have their docs aggregated like joining arrays, this result is the output I could see the semantics of merge not really applying to this convoluted definition. It seems like it'd make more sense to have a whole other operator defined or such based on what use cases should be addressed.

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

loksonarius commented 5 years ago

Yeap, that's pretty much what first came to mind. It may not be the most logical or sane, but it just seemed intuitive and straightforward to implement. Just providing that to start a discussion on what a possible multi-doc merge may look like -- just a proposal πŸ˜… Another approach that came to mind involved something like:

spruce merge source.yaml multi-doc.yaml@2 doing spruce merge source.yaml against the second doc in multi-doc.yaml. Not sure how sensible this is, but it may be more straight forward. Still lets the user determine what's the exact merge logic they want. This would pair nicely with a spruce collect/join/concat command that combines all docs defined in input files into a single multi-doc.yaml file.

geofffranks commented 5 years ago

On Aug 29, 2019, at 1:02 PM, Dan notifications@github.com wrote:

Yeap, that's pretty much what first came to mind. It may not be the most logical or sane, but it just seemed intuitive and straightforward to implement. Just providing that to start a discussion on what a possible multi-doc merge may look like -- just a proposal πŸ˜… Another approach that came to mind involved something like:

spruce merge source.yaml multi-doc.yaml@2 doing spruce merge source.yaml against the second doc in multi-doc.yaml. Not sure how sensible this is, but it may be more straight forward. Still lets the user determine what's the exact merge logic they want. This would pair nicely with a spruce collect/join/concat command that combines all docs defined in input files into a single multi-doc.yaml file.

The @ is interesting and definitely more flexible, but I'd be worried it would get really annoying, if you want to just merge a specific set of settings into everything all the time (probably the most common use case). I'd be tempted to add a '@*' syntax to simplify. I think that would require some deeper changes to merging logic to support multiple simultaneous merges though.

merging a source onto one specific part of a multi-doc file but not the others doesn't seem very intuitive. Seems like you would be implying that part of the document should have spruce operators still, and part shouldn't. Does merging two different source files into different documents of a single multi-doc yaml seem like a common use case?

loksonarius commented 5 years ago

Not terribly, no. I'm honestly more in favor of the earlier approach I mentioned, but was providing the @ one to discuss alternatives. In retrospect, I guess it's a pretty silly setup for the most part πŸ˜….

mattdodge commented 5 years ago

@geofffranks brings up a good point about wanting to merge in settings to multiple docs, I can see that being a useful feature, and it's probably more aligned with my original use case when I brought this up.

I can also see wanting to merge into specific documents though. I'm a total spruce-noob but the more I think about this the more it seems similar to how array merging works. You need a way to identify documents just like you need a way to identify which object in a list to merge into. If we added an operator, similar to merge, called (( document )) or (( document on name )) you could target certain documents. I'm not exactly sure where you'd specify that operator though since we likely aren't working with a list at the top-level document. And up until now I haven't seen any "reserved" object keys that spruce relies on.

What if it was a command line arg to define how documents are identified? Example:

$ spruce merge --document-id=kind multi-doc.yml settings.yml more-doc.yml

multi-doc.yml

kind: Deployment
metadata:
  name: myapp
  label: (( grab settings.label ))
---
kind: Service
metadata:
  name: myapp-svc
  label: (( grab settings.label ))

settings.yml

settings:
  label: my-label

more-doc.yml

kind: Deployment
metadata:
  another: value
---
kind: Service
metadata:
  another: value-again

The merge would know you were targeting a document if you had the kind key at the top level document (based on the CLI arg), so merging together you'd get this:

kind: Deployment
metadata:
  name: myapp
  label: my-label
  another: value
---
kind: Service
metadata:
  name: myapp-svc
  label: my-label
  another: value-again

FWIW, for my original use case I just wanted to use the parse/eval feature of spruce on a multi-doc YML file, not actually do any merging. If there was a CLI command for spruce eval or something similar that only took one file and let me use some of the operators like (( vault )) and (( grab $VAR )) that would have been all I needed.

geofffranks commented 4 years ago

I put something together on the [multi-doc] branch, spruce fan <source file> <target-multi-doc-1>...<multi-doc-N>, where source is a single-doc file. It outputs a multi-doc stream of all the final merged data. Should support all of the same flags as spruce merge, just handles docs differently. Feel free to take it for a spin and offer up any feedback @loksonarius @mattdodge ?

mattdodge commented 4 years ago

Awesome, thanks @geofffranks! I just gave it a spin and I think I see what you're going for here. It would let you "fan" a single settings-like file into a multi-doc file.

The "merging" or handling of the series of multi-doc files isn't that intuitive to me though honestly. Specifying more than one multi-doc file appends all of the documents together. When I think of spruce merge I expect them all to get merged, not concatenated.

Specifically, if I use the example files I put in my previous comment and run this:

spruce fan --prune settings settings.yml multi-doc.yml more-doc.yml

I currently get this (4 documents in result set):

---
kind: Deployment
metadata:
  label: my-label
  name: myapp

---
kind: Service
metadata:
  label: my-label
  name: myapp-svc

---
kind: Deployment
metadata:
  another: value

---
kind: Service
metadata:
  another: value-again

I'd expect the two multi doc files to each get "merged" together though, so I would expect this:

kind: Deployment
metadata:
  name: myapp
  label: my-label
  another: value
---
kind: Service
metadata:
  name: myapp-svc
  label: my-label
  another: value-again

This is still super cool and exciting to see this coming together though! I think the idea to make it a separate command makes a lot of sense. Thanks again.

geofffranks commented 4 years ago

Ah so you’re looking to have something like a source file, an upstream file, and a customization yaml to patch things on top of the upstream definitions?

mattdodge commented 4 years ago

Yeah pretty much. My use case is to keep a kubernetes deployment and service definition in the same file. Then have one file for the main configuration details, and additional files for different environments (prod, stage, etc).

Ideally something like this:

spruce fan settings.yml my-app.yml my-app-stage.yml

settings.yml would contain the meta information (single doc), my-app.yml would hold the bulk of the definitions of the service and deployment objects (multi-doc), and my-app-stage.yml would contain details specific to the stage environment (also multi-doc).

This is just the use case I had that triggered this issue though, by no means do I expect everyone to have the same use case when dealing with multi-doc files. I would love to find the best way to pull this off to support as many examples as possible.

loksonarius commented 4 years ago

This is looking very agreeable. My first stumblings into this idea came from the same use case as Matt, so his feedback's pretty much the same as mine πŸ˜… Thanks for your work @geofffranks

geofffranks commented 4 years ago

It sounds like https://github.com/kubernetes-sigs/kustomize might be better suited to that use case, and if needed, pull spruce in to merge the kustomize patch files before they're overlayed on top of upstream?

I'm a little hesitant to include that type of functionality in spruce, since it's would be tied heavily to the kubernetes yaml structure, especially since kustomize exists and looks like it's now built into kubectl apply. However if you want to keep it all within spruce, I'd be open to a PR for something like spruce kustomize that works similarly to kustomize/leverages their go packages, and then loops through the resultant list of docs and merges each.

mattdodge commented 4 years ago

Ha, funny, we're currently using spruce for our Concourse pipeline YMLs but kustomize for our Kubernetes YMLs. I was looking into replacing kustomize with spruce but I never thought to actually use them together! I might be able to cobble something together.

While they both have their strengths, I do prefer spruce over kustomize mainly due to the more complex operators that are provided. Spruce lets us take "infrastructure as code" to the next level with the things we can do in our YMLs - it's awesome!

Even though the example is kubernetes specific, I do still think there's a place for multi-doc support within spruce that is generic enough to apply to any kind of multi-doc YML file. It may just be a lower priority since there are other options available. I'm sure there are other use cases out there, but I can't think of any use cases for multi-doc YMLs that aren't for kubernetes.

Anyways, I appreciate the thought and discussion around this. It's not a pressing issue, more just an idea. I may try and dust off the ol' golang book and take a crack at a PR.

geofffranks commented 3 years ago

Closing as stale. Also maybe spruce fan solves this now?