garden-rs / garden

Garden grows and cultivates collections of Git trees ~ Official mirror of https://gitlab.com/garden-rs/garden
https://garden-rs.gitlab.io
MIT License
64 stars 9 forks source link

Doc/Feat: Git-Worktree Support #1

Closed nickgarber closed 2 years ago

nickgarber commented 2 years ago

Hello and thank you for garden!

I wonder if garden has support for https://git-scm.com/docs/git-worktree or could be extended to support them? It's useful for me to checkout many worktrees which can get tricky when switching among different workspaces.

I'm still learning about Garden but I appreciate the sanity it brings this challenge and look forward to understanding and using it more over time.

davvid commented 2 years ago

Integrating better with git worktree is a great idea.

Right now garden does support git-worktree-created worktrees in that you can configure and use them as if they were just another full-blown git tree and garden will recognize and work with them.

There's a few places where it could be extended to better support git-worktree worktrees.

The garden grow command that does cloning and setting up of repos could be taught to know about worktrees. An off-the-cuff idea might be to perhaps annotate trees to refer to a parent tree as its worktree-parent: <tree-name> and then garden could know to set them up as worktrees.

The garden plant command adds an existing tree to the config could be extended to setup these relationships, and perhaps automatically register, or require the user to manually register the worktree parent as well if it isn't already part of the config when a child worktree is planted.

I'm glad you found garden useful! I've been using to manage my git annex music/photo/video archives and for defining ad-hoc scripts and workflows for stuff.

I also tend to use it for random spelunking in repos that require some custom setup or actions to be performed. I can never remember that stuff ;-) so I try to write it down...

# garden.yaml
garden:
  root: "${GARDEN_CONFIG_DIR}"
variables:
  env_py3: $ python3 -c 'import sys; print("env%s%s" % sys.version_info[:2])'
trees:
  photo-restoration:
    description: |
      AI restoration of old photos
      $ garden grow photo-restoration
      $ garden cmd photo-restoration init
      $ garden run photo-restoration -- --help
    links:
      - "https://colab.research.google.com/drive/1NEm6AsybIiC5TwTU_4DqDkQO0nFRB-uA?usp=sharing&authuser=2#scrollTo=32jCofdSr8AW"
      - "https://news.ycombinator.com/item?id=25148253"
      - "http://raywzy.com/Old_Photo/"
    environment:
      PATH: "${TREE_PATH}/${env_py3}/bin"
    commands:
      init: |
        set -x
        (
          cd ./Face_Detection && (
            test -f shape_predictor_68_face_landmarks.dat || (
              wget --no-adjust-extension http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 &&
              bzip2 -v -d shape_predictor_68_face_landmarks.dat.bz2
            )
          )
        )
        (
          cd ./Face_Enhancement && (
            test -f checkpoints.zip ||
            wget --no-adjust-extension https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Face_Enhancement/checkpoints.zip
            test -d checkpoints || unzip checkpoints.zip
          )
        )
        (
          cd ./Global && (
            test -f checkpoints.zip ||
            wget --no-adjust-extension https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Global/checkpoints.zip
            test -d checkpoints || unzip checkpoints.zip
          )
        )
        test -d env3 || (
          python3 -m venv ${env_py3} &&
          ${env_py3}/bin/pip install -r requirements.txt
        )
      run: ${env_py3}/bin/python3 run.py "$@"
    url: "git://github.com/microsoft/Bringing-Old-Photos-Back-to-Life.git"
    remotes:
      davvid: "git@github.com:davvid/Bringing-Old-Photos-Back-to-Life.git"
...

Usually I'd end up with a smattering of random shell scripts and notes and stuff, but now I can stick my random experimental dabbles into a garden file and then come back to it later.

Once you run the setup steps in the comments then garden run photo-restoration -- --GPU -1 --input_folder $PWD/input --output_folder $PWD/output performs photo restoration using some fancy machine learning tools.

Sorry.. that's a distraction, but since this is the first garden issue I figured I'd share an example of what I use it with in case it helps others see what it can do.

BTW, what does your typical git worktree workflow look like? Do you clone a bare repo first and then create worktrees inside the bare.git/ directory, or do you place them sibling to each other? Do you use a non-bare repo as the "main" parent worktree, or is it bare so that all of the actual checked out files are git-worktree trees?

Git supports both and we'd probably want to support any arbitrary valid usage, but just curious how folks are using that tool these days.

nickgarber commented 2 years ago

Hi @davvid,

Glad to be in contact with you and thanks for sharing! Far from a distraction, reading through your use-case/workflow is very illuminating! It strengthens my sense there are parts of garden that I'd like to grow into. Initially I'm just glad to have a way to document, reproduce and share my repo/workspace configs.

Workflow and Tools

Myself and my team use a bushy-monorepo pattern, resulting in a catalog of semi-independent, solution-branches, each forked from the same starting branch. Occasionally multiple branches will together compose into a larger body of work, either in concept or in delivery.

In order to make switching among these solutions easier I usually encourage folks to use git-worktrees. This also helps when frequently switching between tasks and (perhaps ironically) helps to flatten out the complexities of branch-mgmt for any that aren't well versed with git. Within each branch we usually use direnv and Nix to dynamically setup available software and environment variables.

Now that I'm working on semi-ephemeral VMs more often I've come to appreciate how home-manager improves the productivity-bootstrapping experience, but it doesn't manage git repos. (BTW I wonder if you would be open to a future integration between home-manager and garden? info)

Envisioning a Possible Future

As it goes, the more ergonomic things become, the more noticeable those non-ergonomic things become! 😆

For me the missing tool takes a config in a common format and blast out dozens of git-repos/worktrees/branch into their right place. On a more ephemeral machine this saves a bunch of time and makes the other benefits of this branching workflow immediately available.

To be able to do that with garden, (perhaps even based on a config file that we could keep in version control) - would be just amazing!

Summary Request

As I understand it, this may involve these changes:

I just hope this is something you may also consider worthwhile.

Cheers and warm regards, Nick


miscellany

davvid commented 2 years ago

I've started writing up an example garden.yaml demonstrating how we can integrate the git-worktree feature.

https://github.com/davvid/garden/blob/next/doc/src/garden.yaml

The main additions to the config file are:

trees:
  example/example.git:
     bare: true
     url: ...

bare: true makes garden clone a bare repository.

trees:
    example/v1:
        worktree: example/example.git
        branch: maint/v1.x

worktree: example/example.git tells it that this is a worktree whose parent is the example/example.git tree. That tree is a bare repository, but it doesn't need to be ~ that's optional.

The example/v1 tree will be created by first ensuring that example/example.git exists. git worktree is then used to create example/v1.

branch: maint/v1.x makes garden checkout and associate the maint/v1 branch with that worktree.

Alternate syntax I considered:

trees:
    example/v2:
        # "parent" instead of worktree.. but worktree seemed like a better choice
        parent: example/example.git
        branch: maint/v2.x
    example/v3:
        git:  # have everything namespaced under a "git" top-level key. Kinda tidy but also a bit nested.
            worktree: example/example.git
            branch: maint/v3.x
            config: # if we did that then git top-level `gitconfig` would make sense here too
                user.email: ${user}@example.com

I also thought.. "what if we overload branch?" so that branch: null can convey a bare repo, but that seemed like too much overloading. It seems simpler to just add a new bare: true key.

worktree: <tree> is a concept very similar to extend: <tree> in that it's going to copy over all of the tree's settings (such as the URL, gitconfig, commands, environment, etc). The one special case is that worktree will always ignore bare: true because these always have worktrees associated with them.

We do support garden -c <path.yaml> to load an alternate garden file, as well as garden -C <path> to make garden chdir somewhere before it does its garden.yaml discovery, so those may be helpful for teaching garden to read it's config from a non-local git repo. Ah.. by non-local do you mean network URLs so that we can pass an http or other url and garden will read the config from that location? That sounds pretty awesome to have built-in. Supporting network urls is kinda like curl <url> | sh which can be dangerous in general and something that often makes security-minded folks uneasy. That would be a pretty useful / convenient feature, though. That's pretty much the classic tradeoff (security vs. convenience) so I'm not necessarily opposed to it.

Regarding Nix and home-manager ~ Nix is conceptually awesome so I'd like to learn more about it in the future and how we can better integrate with home-manager. For now I'll focus this issue primarily on the worktree stuff. It does seem like there's room for expansion there in the future as well.

I've been trying to force myself to work on the documentation first before diving into the implementation, and hopefully that's a good way for us to hash out how we'd like it to work first before implementing it.

Once we agree on what the example garden.yaml should look like then I'll update the documentation around it to reflect the new capabilities. There should probably be a dedicated page in the docs for just the worktree feature IMO.

Let me know what you think about the config file format.

nickgarber commented 2 years ago

Beyond excited to see these updates! I'll review your notes and respond in kind over the next few hours or days.

Thank you so much!!!

davvid commented 2 years ago

I think the worktree spec needs a little bit more metadata in order for it to work. Here's what is currently proposed:

trees:
    project/repo.git:
        url: ...
        bare: true
    project/v1:
        worktree: project/repo.git
        branch: new-branch

The branch also needs a way to specify an upstream branch, otherwise it's going to be a brand-new branch that's not associated with any upstream branch. That might be useful in some situations but I imagine that a common situation is that we want the branch to track an upstream branch.

git worktree add --track -b todo origin/todo is what we have to tell git to create a worktree called "todo" with the same branch name which tracks the same branch from the origin remote.

I think we would need a few more (optional) fields to represent the ability to track upstream branches.

I'm thinking something like this:

trees:
    project/repo.git:
      url: ...
      bare: true
      # The "branches" block is where we can configure relationships between local and remote branches
      branches:
          maint/v1.x: origin/maint/v1.x

    project/v1-worktree:
      worktree: project/repo.git
      # This branch has configuration in the parent "branches" metadata.
      # This worktree is created using "git worktree add --track -b maint/v1.x origin/maint/v1.x"
      branch: maint/v1.x

    # This worktree creates a brand-new "dev" branch that is local to the git repos and not associated with
    # any upstream branches. There is no entry for it in the parent "branches" block so it doesn't get any configuration.
    project/dev:
        worktree: project/repo.git
        branch: dev

That seems like it'd be useful for both git-worktree usage and for regular git repos. The branches + branch concept could even be used in a plain tree entry and it would make sense.

trees:
  project/repo:
    url: ...
    branch: maint/v1.x
    branches:
      maint/v1.x: origin/maint/v1.x

That means to checkout the maint/v1.x branch when creating the repo and associate it with the upstream origin/maint/v1.x branch.


When trees are cloned with a shallow history, eg. depth: 1 as mentioned in #2, then git will only checkout a single branch.

The branches block is useful for that scenario by configuring additional branches that git won't know about upon an initial clone (because git clone --depth=1 implies --single-branch).

That's useful for that scenario, but we should keep the common case simple -- if the branch: v2 is the only thing that's configured (ie. there is no branches: block with per-branch settings), and after cloning the repo we notice that origin/v2 exists, then we should checkout v2 and automatically configure it to be a remote tracking branch.

Basically, branches should be optional and we should make it so that these two configs are be equivalent:

    repo:
        url: ...
        branch: main

should be a shorthand way of writing:

    repo:
        url: ...
        branch: main
        branches:
            main: origin/main

Most of the time users will want local and remote branches that have the same name on origin to be associated with each other. For non-git-worktree clones, cloning with git clone --branch=<name> gets this behavior automatically.

If we ever need to change that behavior then we can add some boolean track-upstream-branches: false to opt-out, but I kinda doubt we'll ever need to do that.

davvid commented 2 years ago

Heads-up @nickgarber garden grow is able to grow worktrees now.

Please ignore the examples above where we use a bare repository for the common storage -- git isn't really designed to work that way.

The documentation now recommends using a "main" repository and additional child worktrees linked off of it. For example:

trees:
  main: https://example.com/repo/example.git
  dev:
    worktree: main
    branch: dev

https://davvid.github.io/garden/commands.html#worktrees

v0.2.0 has been tagged and garden-tools 0.2.0 was published to crates.io.

The only remaining TODO: is to teach garden plant to detect and configure worktrees so this issue remains open.

The rest of the functionality is available now.

nickgarber commented 2 years ago

Thanks so much for making this worktree-centric workflow so graceful!

davvid commented 2 years ago

Thanks! I'm glad you're finding garden helpful.

garden v0.3.0 was just published to crates.io/crates/garden-tools.

This release finalized support for git worktree repositories. garden plant was taught to detect parent and child repositories created using git worktree.

cheers!