Improve filtering functionality and syntax

notheotherben commented 1 month ago

Currently the filtering model we've implemented is built around serde's ability to deserialize YAML tags (i.e. !Something) into enums and we've paired this with a rudimentary "name + tags" model that works relatively well for GitHub repositories (since we're scoped to a user/org to start with, and the identifying traits of a repo can easily be represented with tags). Doing so looks something like this:

backups:
  - kind: github/repo
    from: orgs/SierraSoftworks
    filters:
      - !Include ["git-tool", "grey"]
      - !Is public
      - !IsNot fork

The trouble comes in when we start wanting to support GitHub release artifacts. In this scenario there are several different tiers at which we may want to filter:

The repository that the artifact belongs to.
The release that the artifact is a part of.
The artifact itself.

It's also not entirely obvious what the name should be (for the !Include and !Exclude operators) - right now we've settled for the name of the repo to match semantics with the github/repo model, but it really isn't ideal (as shown below).

backups:
  - kind: github/release
    from: orgs/SierraSoftworks
    filters:
      - !Include ["grey"]
      - !IsNot prerelease
      - !IsNot source-code

What we'd ideally like to be able to do is define a filter that can reference individual properties and perform comparisons. A naive approach would be to model this as follows:

backups:
  - kind: github/release
    from: orgs/SierraSoftworks
    filters:
      repo.name: !In ["grey"]
      release.prerelease: !False
      artifact.source-code: !False

But the issue with this model is that it's difficult to construct combined conditions. What we really, really want is a DSL that lets us construct logical expressions which can then be evaluated by the backup system to determine whether to include something or not. Enter our filter DSL:

  - kind: github/release
    from: orgs/SierraSoftworks
    filter: repo.public && !release.prerelease && !artifact.source-code

The goal of this DSL is to cover the following:

Truthiness and falseiness (i.e. release.prerelease) as well as unary negation of this (i.e. !repo.fork).
Binary equality and inequality (i.e. repo.name == "grey" or repo.name != "git-tool").
Some string comparison operations (i.e. repo.name contains "-rs").
Binary logical operations (i.e. A && B or C || D).
Logical expression grouping (i.e. (A && B) || C).

We'd also like it to be reasonably easy to extend with support for things like glob/regex matches, startswith and endswith etc. Non-goals here are the creation of anything Turing complete, impure (touching the environment/performing I/O) or scoped beyond the context of a single entity (i.e. we're not going to let you do joins or match on sibling items).

rakitaj commented 1 month ago

I'm added tests to cover the logic, and and implemented startswith and endswith, those are the only two language features I see in this issue which aren't implemented.

What do you think about using the rstest crate? Specifically for its parametrized test support? https://docs.rs/rstest/latest/rstest/#creating-parametrized-tests

I haven't used it for the tests I wrong because I don't want to add it as a dependency if you don't want to use it but if you do I'm happy to go back and change the tests.

For now I created some commits on top of this branch but I can't push to the branch or make a new one because I'm not a contributor.

For a spoiler, all this works now

#[test]
fn test_comparison_operators() {
    assert_sequence(
        "== != contains startswith endswith",
        &[Token::Equals, Token::NotEquals, Token::Contains, Token::StartsWith, Token::EndsWith],
    );
}

assert!(Filter::new("name startswith \"John\"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

assert!(Filter::new("name startswith \"JOHN \"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

assert!(!Filter::new("name startswith \"jane\"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

assert!(Filter::new("name endswith \"Doe\"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

assert!(Filter::new("name endswith \" DOE\"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

assert!(!Filter::new("name endswith \"jane\"")
    .expect("parse filter")
    .matches(&obj)
    .expect("run filter"));

notheotherben commented 1 month ago

That looks awesome - I've added you as a maintainer (sorry for forgetting to do that earlier), and I'm more than happy for us to add rstest to the project 👍

rakitaj commented 1 month ago

Awesome, thanks Ben. Uh oh now I'm committed (see what I did there) to this project.

I rebased and pushed my commits directly on the branch.

rakitaj commented 1 month ago

Okay okay, this time I actually pushed my changes for startswith and endswith (after rebasing). Don't feel bad if you beat me to it, I love rstest so I'll add it tomorrow and update the tests where I feel like it's a good fit.

SierraSoftworks / github-backup

Improve filtering functionality and syntax #31