SierraSoftworks / github-backup

Automatically backup your GitHub repositories
MIT License
1 stars 0 forks source link

feat: Add a new filter DSL which enables better control over what is included in a backup #33

Closed notheotherben closed 3 months ago

notheotherben commented 3 months ago

This PR introduces the functionality described in #31. Before we merge it, there are a few things to do:

Background

Currently the filtering model we've implemented is built around serde's ability to deserialize YAML tags (i.e. !Something) into enums and we've paired this with a rudimentary "name + tags" model that works relatively well for GitHub repositories (since we're scoped to a user/org to start with, and the identifying traits of a repo can easily be represented with tags). Doing so looks something like this:

backups:
  - kind: github/repo
    from: orgs/SierraSoftworks
    filters:
      - !Include ["git-tool", "grey"]
      - !Is public
      - !IsNot fork

The trouble comes in when we start wanting to support GitHub release artifacts. In this scenario there are several different tiers at which we may want to filter:

It's also not entirely obvious what the name should be (for the !Include and !Exclude operators) - right now we've settled for the name of the repo to match semantics with the github/repo model, but it really isn't ideal (as shown below).

backups:
  - kind: github/release
    from: orgs/SierraSoftworks
    filters:
      - !Include ["grey"]
      - !IsNot prerelease
      - !IsNot source-code

What we'd ideally like to be able to do is define a filter that can reference individual properties and perform comparisons. A naive approach would be to model this as follows:

backups:
  - kind: github/release
    from: orgs/SierraSoftworks
    filters:
      repo.name: !In ["grey"]
      release.prerelease: !False
      artifact.source-code: !False

But the issue with this model is that it's difficult to construct combined conditions. What we really, really want is a DSL that lets us construct logical expressions which can then be evaluated by the backup system to determine whether to include something or not. Enter our filter DSL:

  - kind: github/release
    from: orgs/SierraSoftworks
    filter: repo.public && !release.prerelease && !artifact.source-code

The goal of this DSL is to cover the following:

We'd also like it to be reasonably easy to extend with support for things like glob/regex matches, startswith and endswith etc. Non-goals here are the creation of anything Turing complete, impure (touching the environment/performing I/O) or scoped beyond the context of a single entity (i.e. we're not going to let you do joins or match on sibling items).