Patches `replace` operation when `match` is not specified

schuylermartin45 commented 1 year ago

This is not the world's best patch, but I want to reduce the blast-radius of these changes for now.

Previously, providing the following valid JSON patch to a recipe:

{
    "op": "replace",
    "path": "@output/build/number",
    "value": "0",
}

would result in Recipe.patch() wiping out the entire block of build sub-keys.

I'm not entirely sure why that was happening, but I believe it had to do with defaulting the match variable to .* which caused the current algorithm to find every line under build and wipe it completely:

build:
  0
  0

Tested changes against perseverance-skills.

After having run into many issues using percy, IMO we need to:

Make patch operations "atomic" (i.e. .patch() should not render and save changes. To put another way, patching should maintain it's in-memory changes and the caller should be responsible for writing to disk. This is to save disk I/O operations when we start making lots of changes AND allows a program to ask a user if they would like the changes that have been made.
In general, make percy RFC 6902 compliant as much as we possibly can.
- This will reduce confusion from anyone making calls to percy
- Custom concepts like @object are not well documented and do not seem entirely necessary.
Add significant testing automation (primarily in the form of unit testing) to ensure reliability of the library AND to prevent regressions.
Reduce/eliminate the use of deepcopy() for performance and code clarity. Especially as it seems to be used heavily for logging purposes.
Reduce/eliminate the injection of custom information into the user-provided patch object in favor of using more variables. This correlates heavily with the previous point.

I can gladly raise these as GitHub issues, but I wanted them to be written down in some form. I do not likely have the time to implement all of these ideas before I get a first version of recipe-bootstrapper out, but I am concerned about the long-term stability of recipe-bootstrapper and percy if these points are not addressed.

cbouss commented 1 year ago

This is not the world's best patch, but I want to reduce the blast-radius of these changes for now.

Previously, providing the following valid JSON patch to a recipe:
{
    "op": "replace",
    "path": "@output/build/number",
    "value": "0",
}
would result in Recipe.patch() wiping out the entire block of build sub-keys.

I'm not entirely sure why that was happening, but I believe it had to do with defaulting the match variable to .* which caused the current algorithm to find every line under build and wipe it completely:
build:
  0
  0
Tested changes against perseverance-skills.

After having run into many issues using percy, IMO we need to:

Make patch operations "atomic" (i.e. .patch() should not render and save changes. To put another way, patching should maintain it's in-memory changes and the caller should be responsible for writing to disk. This is to save disk I/O operations when we start making lots of changes AND allows a program to ask a user if they would like the changes that have been made.

In general, make percy RFC 6902 compliant as much as we possibly can.

This will reduce confusion from anyone making calls to percy

Custom concepts like @object are not well documented and do not seem entirely necessary.

Add significant testing automation (primarily in the form of unit testing) to ensure reliability of the library AND to prevent regressions.

Reduce/eliminate the use of deepcopy() for performance and code clarity. Especially as it seems to be used heavily for logging purposes.

Reduce/eliminate the injection of custom information into the user-provided patch object in favor of using more variables. This correlates heavily with the previous point.

I can gladly raise these as GitHub issues, but I wanted them to be written down in some form. I do not likely have the time to implement all of these ideas before I get a first version of recipe-bootstrapper out, but I am concerned about the long-term stability of recipe-bootstrapper and percy if these points are not addressed.

JC stumbled upon the same issue when working on another topic. This PR should address it: https://github.com/anaconda-distribution/percy/pull/40

To address some of your comments:

patch atomic: good idea
patch as close as possible as rfc: I am not sure the rfc can support all our use cases
@output: This is useful to define generic patches while still being able to be specific when needed.

What I realize is that to use this patch implementation efficiently, one still need to be a regex master. At the match regex influence a lot the results.

Some additional information:

The patch method relies on the recipe to be rendered (this means at least one variant required).
Use cases:
- remove an element matching a condition
- add an element (duplicates allowed)
- replace an element
- replace or add an element Things to take into consideration: whether to preserve selectors or not...

Example to append patches:

[
    {
        "op": "add",
        "path": "source/patches",
        "match": "(aa.patch)|(bb.patch)",
        "value": ["aa.patch", "bb.patch"]
    }
]

Examples to replace cython or cython 0.29.23 by cython 0.29 in any package of the recipe:

[
    {
        "op": "replace",
        "path": "@output/requirements/host",
        "match": "cython\\s+0.29.\\d+",
        "value": ["cython 0.29"]
    },
    {
        "op": "replace",
        "path": "@output/requirements/host",
        "match": "cython\\s*($|[^#])",
        "value": ["cython 0.29"]
    }
]

Example to update a build number: [{"op": "replace", "path": "build/number", "value": "0"}, {"op": "replace", "path": "@output/build/number", "value": "0"}]

schuylermartin45 commented 1 year ago

@cbouss I got pretty far with the parse-tree implementation late last week. For a day and a half of dev time, I think it's very promising. I can already go from YAML -> tree -> YAML losslessly on a simple example.

The work is here and I plan to continue to work through test cases this week: https://github.com/anaconda-distribution/percy/tree/schuy_recipe_parser

I'm going to start with the replace patch operation first as it addresses what I need in recipe-bootstrapper. I think I can match the RFC pretty closely and potentially expand upon it for our needs.

If you would like to discuss further about this, we can hop on a call, maybe with JC and Mark as well, but I think the parse tree method is going to be a very powerful tool. I think you saw my stats from last week, but I think we can cover about 90% of recipes with it and maybe fall back to a regex mechanism for the other 10%. And that last 10% is not likely to be automated anytime soon.

cbouss commented 1 year ago

https://github.com/anaconda-distribution/percy/tree/schuy_recipe_parser

That's cool.

So what would be needed eventually is:

Ability to update yaml values
Ability to update jinja variables
Ability to update selectors variables

If you haven't looked at it, ruamel.yaml.jinja2 does handle some of the preserving part of jinja. But it doesn't allow updating jinja itself. ruamel parser also has editing capabilities on comments (useful since selectors are comments)

A meeting with a few people would be useful. Perhaps we should define some use cases.

schuylermartin45 commented 1 year ago

- Ability to update yaml values
- Ability to update jinja variables
- Ability to update selectors variables

I can confidently handle those first two points with that branch/already have plans to handle those cases. The third one I have not addressed and the parse tree is storing those as "comments" per node in the tree. I don't need selectors for my current project needs, but I don't see an issue with parsing the "comment" into a more semantic form.

When I get a chance, I'll find time on the calendar to get a sync meeting between you, me, JC, and Mark.

cbouss commented 1 year ago

Ah, I mostly needed 1 and 3 so far :)

To keep all links in one place, this is the conda recipe proposal change to address issues regarding manipulating recipes: https://github.com/conda-incubator/ceps/pull/54

JeanChristopheMorinPerso commented 1 year ago

And https://github.com/conda-incubator/ceps/pull/53

JeanChristopheMorinPerso commented 1 year ago

Ability to update yaml values

Ability to update jinja variables

Ability to update selectors variables

I would add to "Ability to update selectors variables" that we should be able to add, remove and update selectors without involving any kind of environment evaluation. For example, when I ask to add a selector, I want it to be applied no matter what variants are currently available.

schuylermartin45 commented 1 year ago

I'm going to close this out, given the parse tree work.

anaconda / percy

Patches `replace` operation when `match` is not specified #39