Open rogpeppe opened 11 months ago
Any reason not to return the error as a second value? like this:
type WalkDirEntry struct {
Path string
Entry fs.DirEntry
skipDir *bool
}
func WalkDirIter(fsys fs.FS, root string) func(func(WalkDirEntry, error) bool) {
Seems like the conclusion of #61405 is to return errors this way. https://github.com/golang/go/issues/61405#issuecomment-1708800305
Any reason not to return the error as a second value? like this:
That was @Merovius's suggestion too. I wasn't sure, because the Path and Entry fields can be valid even when the error isn't, making the docs a bit awkward to write, which made me think it might not be the correct approach.
Nonetheless, here's a stab at it. Maybe it's not so awkward after all.
// WalkDirIter returns an iterator that can be used to iterate over the contents
// of a directory. The implementation uses [WalkDir].
//
// Each iteration produces an entry and an error that will be non-nil
// for unreadable directories. The [WalkDirEntry.Path] field is always valid;
// the [WalkDirEntry.Entry] field is always valid except for when the
// initial Stat on the root directory fails.
//
// If a directory's ReadDir method fails, the iteration will produce two
// entries, both with the directory's path: first with a nil error,
// giving the body of the loop the chance to stop or call [WalkDirEntry.SkipDir],
// and secondly with the error from ReadDir.
//
// To skip a directory, call the SkipDir method on an entry.
func WalkDirIter(fsys fs.FS, root string) func(func(WalkDirEntry, error) bool) {
var skipDir bool // outside the loop so we'll only allocate once.
return func(yield func(WalkDirEntry, error) bool) {
fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
info := WalkDirEntry{
Path: path,
Entry: d,
skipDir: &skipDir,
}
skipDir = false
if !yield(info, err) {
return fs.SkipAll
}
if skipDir {
return fs.SkipDir
}
return nil
})
}
}
I am also not a big fan of the SkipDir
method, it looks weird to skip something (change the state of the iterator) using the result of the iterator, but on the other hand I don't know how this could be done differently with range-over-func. Was this discussed somewhere in #61405?
I've observed (https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ago+fs.WalkDir&patternType=standard&sm=1&groupBy=repo) that most fs.WalkDirFunc
functions first check for the existence of an error. This makes me inclined to sympathize with @mateusz834's suggestion. However, what I observe most is that when the error is not checked first, access to the DirEntry
argument occurs without verifying that it is not nil
. (It is nil
if the stat on the root directory has failed.)
This lack of checking obviously leads to a panic. There are also two missing checks in two tests in the Go repository.
I suggest not passing the error as a parameter, that is, leaving only one parameter, but ensuring that WalkDirIter
returns an error in case the stat on the root directory fails:
files, err := fs.WalkDirIter(fsys, root)
if err != nil {
// stat on the root directory failed
...
}
for file := range files {
_ = file.Entry.Name() // this cannot panic because file.Entry is always guaranteed to be non-nil
}
@gazerro Yes, I tend to agree, although it's a bit unfortunate that that makes it considerably harder (or perhaps no possible?) to implement WalkDirIter
in terms of WalkDir
AFAICS. Perhaps it might be preferable to reverse that, because I think it should be easier to implement WalkDir
in terms of WalkDirIter
with those semantics.
Perhaps it might be preferable to reverse that, because I think it should be easier to implement
WalkDir
in terms ofWalkDirIter
with those semantics.
Yes, I agree. It would be more appropriate to implement WalkDir
in terms of WalkDirIter
and possibly deprecate WalkDir
.
I also suggest naming the new function DirWalker
(or perhaps simply Entries
? Though, this name might be overly simplistic). While WalkDir
appropriately denotes an action, the new function simply returns an iterator (or a sequence, as one prefers to say). So it would be used as:
entries, err := fs.DirWalker(fsys, root)
if err != nil {
// stat on the root directory failed
...
}
for entry := range entries {
// ...
}
Individual subdirectories and files can fail, so I don’t know what it saves you to return an error for the initial stat.
@carlmjohnson if the first stat fails, the DirEntry.Entry
field is nil
. Therefore, before accessing DirEntry.Entry
, it is necessary to check that it is not nil
. This should be done both within an if err != nil { ... }
statement and in cases where the error is not handled because we are not interested in handling it.
I have noticed many cases (https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ago+fs.WalkDir&patternType=standard&sm=1&groupBy=repo) where this check is not done.
In addition, this is my opinion, I find it incorrect for the function to return an iterator on a root directory that does not exist.
It's often more convenient to use a for loop rather than a callback-based API, and fs.WalkDir and filepath.Walk are both callback-based APIs. I've been using github.com/kr/fs for years to work around this. The new iterate-over-func functionality (#61405) now makes it straightforward to adapt the stdlib functions into iterators, so perhaps we could provide iterator-based APIs as part of the standard library now.
Here's a straw-man implementation:
The main point of contention from my point of view is the semantics of the SkipDir method. Calling a method on an iterated item to determine the subsequent course of iteration seems somewhat unusual, but I don't see any other decent alternative.
[Edited to remove incorrect remarks about GC behavior].