cppforlife / go-patch

MIT License
22 stars 14 forks source link

Copy/Move operation #5

Open dsabeti opened 7 years ago

dsabeti commented 7 years ago

There are some pretty valuable use cases for have a move/copy functionality, primarily when you want to re-arrange nested objects.

For example, suppose you have a yaml file like a BOSH manifest with the following structure:

instance_groups:
- name: api
  jobs:
  - name: api-server
    properties:
      db_string: mysql://mysql.example.com:3306
  - name: api-worker
    properties:
      db_string: mysql://mysql.example.com:3306
    <lots and lots of configuration...>

In BOSH land, if you wanted to scale the workers without scaling the servers, you'd have place the api-worker job in a different instance_group and then scale that separately. That would mean you'd want to end up with the following yaml:

instance_groups:
- name: api
  jobs:
  - name: api-server
    properties:
      db_string: mysql://mysql.example.com:3306

- name: worker
  jobs:
  - name: api-worker
    properties:
      db_string: mysql://mysql.example.com:3306
    <lots and lots of configuration...>

It should be easy, except that your operation definition would need to know the internal data of the yaml you're operating on:

- type: replace
  path: /instance_groups/-
  value:
    name: worker
    jobs:
    - name: api-worker
      properties:
        db_string: mysql://mysql.example.com:3306
    <lots and lots of configuration...>

You have to copy all of the data that already exists in the yaml being operated on. With a move functionality, on the other hand, you'd be able to write something like the following operation instead:

- type: replace
  path: /instance_groups/-
  value:
    name: worker
    jobs: ~

- type: move
  from: /instance_groups/name=api/jobs/name=worker
  path: /instance_groups/name=worker/jobs/-

Or, if you're super cool:

- type: move
  from: /instance_groups/name=api/jobs/name=worker
  path: /instance_groups?/name=worker/jobs/-

No knowledge of the internals of api-worker required. A copy could function the same way.

Would this be a reasonable interface for move/copy? Is something that could get added? Asking for a friend.

cppforlife commented 7 years ago

@dsabeti move/copy are not implemented at this point on purpose. if viewed as a generic library, this proposed functionality makes a lot of sense; however, one of my goals though is to expose missing abstractions in bosh's deployment configuration. there are several scenarios that possibly need to be discussed further to see if they require bosh core changes:

dsabeti commented 7 years ago

Make sense. I'd actually prefer a BOSH primitive for changing colocation strategies. Do you know how far out that work is? I think we were mostly thinking about this feature as a stop-gap until "topologies" or "roles" are implemented in BOSH.

anEXPer commented 7 years ago

We had another break today due to the duplication the current constraints require.

We have opsfiles that add instance groups that need configuration common to other jobs, but can't copy the necessary configuration in from anywhere, so have to duplicate it. Specifically, we had changes to how we accessed name variables that didn't make it into the ops files.

For something like, say, every metron agent needing mutual tls stuff to talk to loggregator, it would be really nice to be able to copy this stuff until it's made available via link.

I know that links are the correct solution. In the mean time, we're feeling pain from incorrectness we're not positioned to do much about.

zaksoup commented 7 years ago

@cppforlife why isn't this being viewed as a generic library? I'd love to see go patch interpolation added as a feature to fly CLI, for example, and this library being generic as possible should facilitate that.

jmcarp commented 6 years ago

@cppforlife: what's the status of this feature request? I see that it's marked with v2, so I have hope that it'll happen one day. For my team, it would be useful for configuring diego isolation segments, concourse tagged workers, etc., where we want to copy the configuration of a base diego cell, concourse worker, etc., and apply a few changes on top. Or is there a cleaner way to handle that use case?

cppforlife commented 6 years ago

@jmcarp one of the unanswered questions for copy op is once content is copied, how does one reference new content or original content (aside from some kind of numeric index) since both content pieces will be same?

anEXPer commented 6 years ago

Is that really a problem though? They'll have unique paths, right? Since one must copy to an addressable location?

dsabeti commented 6 years ago

@cppforlife Better support for changing colocation strategies would be super useful. We're trying to start working on a small-footprint deployment manifest for CF, but we're not sure how we'll keep properties in sync between our main manifest and our small-footprint manifest, so the feature request is becoming more timely.

jmcarp commented 6 years ago

@cppforlife: I see what you mean. If we have a manifest like this:

instance_groups:
- name: foo

and an operation like this:

- type: copy
  from: /instance_groups/name=foo
  path: /instance_groups/-

then we'll get a manifest like this:

instance_groups:
- name: foo
- name: foo

and won't be able to distinguish the two identical instance groups, except by index. We could always immediately modify the last instance group:

- type: copy
  from: /instance_groups/name=foo
  path: /instance_groups/-
- type:  replace
  path: /instance_groups/-1/name
  value: bar

but that feels like a hack. What if we allowed key-value pairs from the path to override values in the copied object, like this:

- type: copy
  from: /instance_groups/name=foo
  path: /instance_groups?/name=bar

so that we would copy instance group foo, but override its name to bar. That would give the original and copied objects different key-value paths. Would that make sense?

anEXPer commented 6 years ago

You could also work around this issue by inserting the new instance group name with a replace, and then copying its properties, etc, into place with copy operations.

The "make another instance group, but with a different name and one or two changed properties" use case suffers from this, though it has workarounds. The "copy a job from its usual location to a colocation instance group and then delete the original to reduce footprint" use case doesn't run into this at all.

dpb587-pivotal commented 6 years ago

More brainstorming syntax...

A new type which changes context and whose value are ops to be applied to it.

- path: /instance_groups/name=foo
  type: apply # change context
  value:
  - path: /
    type: copy
    value: //instance_groups/- # xquery-style double slash for root
  - path: /name
    type: replace
    value: bar

A new ops field [only on copy?] which supports relative paths.

- path: /instance_groups/name=foo
  type: copy
  value: /instance_groups/-
  ops:
  - path: name
    type: replace
    value: bar
jmcarp commented 6 years ago

@dpb587-pivotal: I like the second option that you proposed--it's like the syntax I was thinking of, but more explicit. @cppforlife: wdyt? I'd be happy to write a patch if you think the interface makes sense.

bgandon commented 6 years ago

The post-copy local operations are elegant. I would only suggest that the sub-key for defining them is named in such way that clearly states when the sub-operations are going to happen.

Indeed, writing this would be perfectly legal, but hard to catch:

- ops: # it's not obvious here that this is happening after the copy below has taken place
  - path: name
    type: replace
    value: bar
  path: /instance_groups/name=foo
  type: copy
  value: /instance_groups/-

Something like post-ops, after or then (which reads nicely) would fit there:

- path: /instance_groups/name=foo
  type: copy
  value: /instance_groups/-
  then:
  - path: /name # root context is set to the root of the copied node
    type: replace
    value: bar

Whereas post-ops and after would allow to later introduce pre-ops or before, if at some point such concept turns out to be relevant for some operations.

As far as path is concerned, I would recommend for consistency that we don't introduce any new kind of path expressions, and just state that sub-operations are local to the targeted YAML node.