go-git / go-billy

The missing interface filesystem abstraction for Go
https://pkg.go.dev/github.com/go-git/go-billy/v5
Apache License 2.0
300 stars 43 forks source link

Add RemoveAll to Basic Interface? #77

Open cipherboy opened 1 month ago

cipherboy commented 1 month ago

I was wanting to call the equivalent of os.RemoveAll(...) on a worktree's filesystem and ended up implementing support for it here and in go-git (the latter was a trivial change only in RepositoryFilesystem; would this change be welcomed here?

I suspect ideally this would be implemented as an optional additional interface to not break backwards compatibility (existing user would have to implement the interface on upgrade), but maybe it would be best for a v6 to simplify the interface? Thoughts?


I also ended up implementing a non-chroot'd version (similar to osfs/os_bound.go, but taking into account an arbitrary $PWD different from the actual $PWD at the time of execution), but I'm not sure that changeset has relevance outside of my use case. :-) Let me know if it is also of interest though!

pjbgf commented 4 weeks ago

Hey @cipherboy!

The util package contains a RemoveAll(fs billy.Basic, path string) error function that can be used to that effect.

However, a native RemoveAll within the Basic interface would come handy, but I don't see that working before v6 as you mentioned. Feel free to submit a PR to add that targeting the v6-exp branch, which is where we are keeping the changes for v6.

As for the os_bound implementation, go-billy is primarily focused on supporting go-git. So I am not sure your specific use case would fit that bill. The os_bound already has an arbitrary $PWD, which is immutable. I am not sure how different that is from your version.

cipherboy commented 4 weeks ago

\o hey @pjbgf -- regarding the last point, I ran across this when working across multiple repos: https://github.com/go-git/go-git/issues/1155#issuecomment-2242778023

It seems this only works if dir == . and $PWD is the root of the desired Git repository. Otherwise, when I had say, dir=repos/some-repo, every operation in the repository involving simlinks resulted in them being prefixed with dir, rewriting links from say lib->lib64 on checkout to become lib->repos/some-repo/lib64. I was working across multiple repositories, so continually updating the app's $PWD for the desired repo to edit would've maybe been possible but likely would've been a bit of work.

I think the issue with this is it's unsafe for sandbox execution/bypass; you don't want to necessarily allow or follow symlinks to arbitrary places on disk if you're a git repo provider. But if you're using known repositories it probably isn't as much an issue.

Let me see if I can create a better reproducer.

cipherboy commented 4 weeks ago

@pjbgf Hmm, looks like it was simpler than I thought. Check out this code:

https://github.com/cipherboy/testbed/tree/master/go-git-repo-symlink

Script and code Script: ```bash #!/bin/bash create_repo() {( rm -rf repos/ repos.bak/ mkdir -p repos/example cd repos/example git init . git status echo "initial" > README.md git add README.md && git commit -m "initial" git status echo "lib symlink" > README.md mkdir lib64 echo "whoami" > lib64/code.sh ln -s lib64 lib git add lib64 lib README.md && git commit -m "lib symlink" git status echo "absolute symlink" > README.md ln -s /bin/bash lib64/shell git add lib64/shell README.md && git commit -m "absolute symlink" git status echo "latest change" > README.md git add README.md && git commit -m "latest change" git log | cat - )} backup_repo() {( cp -pr repos/ repos.bak )} create_repo backup_repo go mod tidy # go run ./main.go ( cd repos/example && go run ../../main.go . ) ``` Code: ```go package main import ( "fmt" "os" "github.com/go-git/go-billy/v5/osfs" "github.com/go-git/go-git/v5" "github.com/go-git/go-git/v5/plumbing/cache" "github.com/go-git/go-git/v5/plumbing/object" "github.com/go-git/go-git/v5/storage/filesystem" ) func getRepo(dir string) (*git.Repository, error) { // See https://github.com/go-git/go-git/issues/1155#issuecomment-2242778023 wt := osfs.New(dir, osfs.WithBoundOS()) dotfs, err := wt.Chroot(git.GitDirName) if err != nil { return nil, err } store := filesystem.NewStorage(dotfs, cache.NewObjectLRUDefault()) // Clone -> Open return git.Open(store, wt) } func walkCommits(repo *git.Repository) error { iter, err := repo.CommitObjects() if err != nil { return fmt.Errorf("failed getting commit objects: %v", err) } repoW, err := repo.Worktree() if err != nil { return fmt.Errorf("failed to get repo worktree: %v", err) } if err := iter.ForEach(func(commit *object.Commit) error { if err := repoW.Checkout(&git.CheckoutOptions{Hash: commit.Hash, Force: true}); err != nil { return fmt.Errorf("failed to checkout commit (%#v): %v", commit.Hash.String(), err) } return nil }); err != nil { return fmt.Errorf("failed iterating over commit object: %v", err) } return nil } func main() { dir := "repos/example" if len(os.Args) > 1 { dir = os.Args[1] } fmt.Printf("using dir=%v\n", dir) repo, err := getRepo(dir) if err != nil { panic(fmt.Sprintf("failed getting repository: %v", err)) } if err := walkCommits(repo); err != nil { panic(fmt.Sprintf("failed walking commits: %v", err)) } } ```

In particular, the current mode (dir=. with $PWD=repo/example) fails with:

panic: failed walking commits: failed iterating over commit object: failed to checkout commit ("97f28a73331c05fe8bfda6561f7e3786dc31c622"): remove bin/bash: no such file or directory

or similar depending on what file gets executed first.

But the other one has the error I remember:

panic: failed walking commits: failed iterating over commit object: failed to checkout commit ("36f532a4ff85e64af9683daa8dfbe157df3c8dae"): open repos/example/lib64: no such file or directory

Both of these hashes are (the freshly generated) repo's initial commit. lib is a symlink to lib64; both were added in the second commit.

With my hacky filesystem, both methods pass.

The implementation essentially looks like this:

Implementation ```go //go:build !js // +build !js package osfs import ( "os" "path/filepath" "strings" securejoin "github.com/cyphar/filepath-securejoin" "github.com/go-git/go-billy/v5" ) // WorkingDirOS is a legacyfilesystem without a chroot of the OS filesystem, // but instead treats relative directories as if they were specific to a // given underlying path, even if CWD changes. type WorkingDirOS struct { baseDir string } func newWorkingDirOS(baseDir string) billy.Filesystem { return &WorkingDirOS{baseDir} } func (fs *WorkingDirOS) Chroot(path string) (billy.Filesystem, error) { joined, err := securejoin.SecureJoin(fs.baseDir, path) if err != nil { return nil, err } return New(joined), nil } func (fs *WorkingDirOS) Root() string { return fs.baseDir } func (fs *WorkingDirOS) Create(filename string) (billy.File, error) { if !filepath.IsAbs(filename) && !strings.HasPrefix(filename, fs.baseDir) { filename = filepath.Join(fs.baseDir, filename) } return fs.OpenFile(filename, os.O_RDWR|os.O_CREATE|os.O_TRUNC, defaultCreateMode) } func (fs *WorkingDirOS) OpenFile(filename string, flag int, perm os.FileMode) (billy.File, error) { if !filepath.IsAbs(filename) && !strings.HasPrefix(filename, fs.baseDir) { filename = filepath.Join(fs.baseDir, filename) } return openFile(filename, flag, perm, fs.createDir) } func (fs *WorkingDirOS) createDir(fullpath string) error { if !filepath.IsAbs(fullpath) && !strings.HasPrefix(fullpath, fs.baseDir) { fullpath = filepath.Join(fs.baseDir, fullpath) } dir := filepath.Dir(fullpath) if dir != "." { if err := os.MkdirAll(dir, defaultDirectoryMode); err != nil { return err } } return nil } func (fs *WorkingDirOS) ReadDir(dir string) ([]os.FileInfo, error) { if !filepath.IsAbs(dir) && !strings.HasPrefix(dir, fs.baseDir) { dir = filepath.Join(fs.baseDir, dir) } return readDir(dir) } func (fs *WorkingDirOS) Rename(from, to string) error { if !filepath.IsAbs(from) && !strings.HasPrefix(from, fs.baseDir) { from = filepath.Join(fs.baseDir, from) } if err := fs.createDir(to); err != nil { return err } if !filepath.IsAbs(to) && !strings.HasPrefix(to, fs.baseDir) { to = filepath.Join(fs.baseDir, to) } return rename(from, to) } func (fs *WorkingDirOS) MkdirAll(path string, perm os.FileMode) error { if !filepath.IsAbs(path) && !strings.HasPrefix(path, fs.baseDir) { path = filepath.Join(fs.baseDir, path) } return os.MkdirAll(path, defaultDirectoryMode) } func (fs *WorkingDirOS) Open(filename string) (billy.File, error) { return fs.OpenFile(filename, os.O_RDONLY, 0) } func (fs *WorkingDirOS) Stat(filename string) (os.FileInfo, error) { if !filepath.IsAbs(filename) && !strings.HasPrefix(filename, fs.baseDir) { filename = filepath.Join(fs.baseDir, filename) } return os.Stat(filename) } func (fs *WorkingDirOS) Remove(filename string) error { if !filepath.IsAbs(filename) && !strings.HasPrefix(filename, fs.baseDir) { filename = filepath.Join(fs.baseDir, filename) } return os.Remove(filename) } func (fs *WorkingDirOS) TempFile(dir, prefix string) (billy.File, error) { if err := fs.createDir(dir + string(os.PathSeparator)); err != nil { return nil, err } return tempFile(dir, prefix) } func (fs *WorkingDirOS) Join(elem ...string) string { return filepath.Join(elem...) } func (fs *WorkingDirOS) RemoveAll(path string) error { if !filepath.IsAbs(path) && !strings.HasPrefix(path, fs.baseDir) { path = filepath.Join(fs.baseDir, path) } return os.RemoveAll(filepath.Clean(path)) } func (fs *WorkingDirOS) Lstat(filename string) (os.FileInfo, error) { if !filepath.IsAbs(filename) && !strings.HasPrefix(filename, fs.baseDir) { filename = filepath.Join(fs.baseDir, filename) } return os.Lstat(filepath.Clean(filename)) } func (fs *WorkingDirOS) Symlink(target, link string) error { if err := fs.createDir(link); err != nil { return err } if !filepath.IsAbs(link) && !strings.HasPrefix(link, fs.baseDir) { link = filepath.Join(fs.baseDir, link) } return os.Symlink(target, link) } func (fs *WorkingDirOS) Readlink(link string) (string, error) { if !filepath.IsAbs(link) && !strings.HasPrefix(link, fs.baseDir) { link = filepath.Join(fs.baseDir, link) } return os.Readlink(link) } // Capabilities implements the Capable interface. func (fs *WorkingDirOS) Capabilities() billy.Capability { return billy.DefaultCapabilities } ```

i.e., a very simple one with a bunch of:

if !filepath.IsAbs(link) && !strings.HasPrefix(link, fs.baseDir) {
    link = filepath.Join(fs.baseDir, link)
}

in relevant places.

pjbgf commented 4 weeks ago

@cipherboy thanks for clarifying and sharing the code. I think the current implementation is mistakenly trying to remove the target instead of the symlink. However, the safeguards in place to restrict deletion of things outside of the repository dir are blocking that from taking place. The bug here is that we are doing a Stat instead of a Lstat here.

Would you be keen to propose a PR to fix this?