golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.86k stars 17.52k forks source link

os: add ReadDir method for lightweight directory reading #41467

Closed rsc closed 3 years ago

rsc commented 3 years ago

os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.

On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.

But many Go users use Unix systems.

On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.

Much of the time, such as in the implementation of file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.) In fact, a survey of existing Go code found that only about 10% of uses of ReadDir actually need more than names and is-directory bits.

It appears that a third way to read directories should be added, to let all this code be written more efficiently. Expanding on a suggestion by @mpx, I propose to add:

// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error) 

// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
    // Name returns the name of the file (or subdirectory) described by the entry.
    // This name is only the final element of the path, not the entire path.
    // For example, Name would return "hello.go" not "/home/gopher/hello.go".
    Name() string

    // IsDir reports whether the entry describes a subdirectory.
    IsDir() bool

    // Info returns the FileInfo for the file or subdirectory described by the entry.
    // The returned FileInfo may be from the time of the original directory read
    // or from the time of the call to Info. If the file has been removed or renamed
    // since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
    // If the entry denotes a symbolic link, Info reports the information about the link itself,
    // not the link's target.
    Info() (FileInfo, error)
}

The FS proposal would then adopt this ReadDir and ignore Readdir entirely.

In #41188 I wrote:

Various people have proposed adding a third directory reading option of one form or another, to get names and IsDir bits. This would certainly address the slow directory walk issue on Unix systems, but it seems like overfitting to Unix.

I still believe that, but the survey convinced me that nearly all existing Readdir uses fall into this category, so it's not quite so bad to provide an optimized path for Unix systems. The DirEntry.Info method specification above allows both the eager info loading of Plan 9/Windows and the lazy loading needed on Unix. In contrast to #41188, the laziness is explicitly allowed from the beginning, and failures of the lazy loading can be reported in the error result.

Thoughts?

Update: A few clarifications to common questions:

Update 2: A few changes were made along the way to acceptnce. See https://github.com/golang/go/issues/41467#issuecomment-708536303 for the final version.

israel-lugo commented 3 years ago

Thank you, @rsc. I think your argument makes sense.

I have not checked myself whether there is any common case that does not support d_type.

Just did a cursory search. ntfs-3g for example supports it. XFS also supports it, although it seems you must enable it at filesystem creation time (presumably because it needs some additional data field).

Interestingly, there are reports https://www.pimwiddershoven.nl/entry/docker-on-centos-7-machine-with-xfs-filesystem-can-cause-trouble-when-d-type-is-not-supported that Docker (or the overlay storage driver) will break if running on an XFS filesystem without d_type support, which is surprising to say the least.

It seems that in most common cases, d_type will be supported, meaning the slow path should be very infrequent. My only question would be whether there is some (common) special condition where d_type is disabled, e.g. if using LVM or some other unexpected thing (LVM in particular would seem unlikely to affect this). But I wouldn't block on that.

This new API is a great improvement already.

On 12/10/2020 17:47, Russ Cox wrote:

@israel-lugo https://github.com/israel-lugo, the rationale for DT_UNKNOWN handling is in #41467 (comment) https://github.com/golang/go/issues/41467#issuecomment-697776129.

Part of the rationale for making ReadDir do the lstat calls is that (1) Readdir already does and (2) we have yet to identify a common case where the lstat is needed.

Older systems - notably AIX and Solaris - do not include a type byte in the dirent structure at all. Those will need lstat.

On other systems, it is technically possible to get type DT_UNKNOWN, but does it happen in any common cases? If so, what are they? What operating systems, what file systems, and what additional conditions? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/41467#issuecomment-707230914, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEW7LAYMRCJGBWT4SIA6E3TSKMXKFANCNFSM4RRIBAGA.

networkimprov commented 3 years ago

Isn't ISO9660 also a common disk image format?

rsc commented 3 years ago

@networkimprov, yes basically any .iso CD image you download to install an OS in a VM is an ISO9660 image. But since it is a read-only (well, append-only) format, it's pretty rare to use for much other than installers. Preserving a simpler API for Go outweighs that use case, especially since the kernels can easily fix any actual inefficiency with a tiny patch to their drivers.

hirochachacha commented 3 years ago

@rsc What about ioutil.ReadDir? Are there any subsequent proposals? I think this API is great and it can be a drop-in replacement for (*File).Readdir. However, according to your survey result:

$ grep Readdir readdirs.txt | wc -l
15502
$ grep ioutil.ReadDir readdirs.txt | wc -l
47424

Most people prefer to use ioutil.ReadDir. They don't like boilerplate.

hirochachacha commented 3 years ago

Since this is for performance optimization, to postpone ioutil changes to go2 may be fine though.

diamondburned commented 3 years ago

What about ioutil.ReadDir?

I don't think ioutil.ReadDir could benefit from this proposal; the reason being the API needs os.FileInfo returns, which would require a full Lstat call. As far as I can tell, the proposal is heading towards DirEntry, which is a completely new interface in a completely new API.

This is a bit on a tangent, but is Go 2 allowed to break existing code?

ianlancetaylor commented 3 years ago

This would not be a Go 2 issue as such, it be a os/v2 issue, so code that continues to use os would be unchanged (or swap os with io/ioutil if you like). But an os/v2 (or io/ioutil/v2) package is unplanned and unlikely.

hirochachacha commented 3 years ago

I wouldn't expect that it improve ioutil.ReadDir directly. This API may supersede (*File).Readdir. But what about ioutil.ReadDir? Are there any plans to introduce new API which supersede ioutil.ReadDir? The name ReadDir conflict with existing ioutil.ReadDir, so I thought there might be a plan to replace existing signature by new one.

This proposal is justified by the fact 90% code doesn't use more than name and isDir, but 75% code is using ioutil.ReadDir. Providing a solution for 25% code is good enough? I just wondered. Sorry for the interruption.

networkimprov commented 3 years ago

See type ReadDirFS interface in https://go.googlesource.com/proposal/+/master/design/draft-iofs.md

hirochachacha commented 3 years ago

You mean, io/fs might want to use the new signature which is introduced here, thus we can use something like fs.ReadDir(os.DefaultFS, "dir") instead of ioutil.ReadDir("dir") in the future? Sounds good. Thank you for sharing.

rsc commented 3 years ago

If this proposal is accepted we can worry about ioutil next. The most likely answer is to put the helper ReadDir(dir string) ([]DirEntry, error) into os itself (part of completely deprecating ioutil).

rsc commented 3 years ago

No change in consensus, so accepted.

Update: this is what was accepted:

// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error) 

// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
    // Name returns the name of the file (or subdirectory) described by the entry.
    // This name is only the final element of the path, not the entire path.
    // For example, Name would return "hello.go" not "/home/gopher/hello.go".
    Name() string

    // IsDir reports whether the entry describes a subdirectory.
    IsDir() bool

    // Type returns the type bits for the entry.
    // The type bits are a subset of the usual FileMode bits, those returned by the FileMode.Type method.
    Type() os.FileMode

    // Info returns the FileInfo for the file or subdirectory described by the entry.
    // The returned FileInfo may be from the time of the original directory read
    // or from the time of the call to Info. If the file has been removed or renamed
    // since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
    // If the entry denotes a symbolic link, Info reports the information about the link itself,
    // not the link's target.
    Info() (FileInfo, error)
}

// Type returns the type bits (m & ModeType).
func (m FileMode) Type() FileMode { return m & ModeType }
benhoyt commented 3 years ago

@rsc I don't know what you normally do for proposals like this, but is it worth updating the description at the top with the final proposal (including Type)? It would make it easier for people to see what the final proposal is at a glance, without wading through 100+ comments.

rsc commented 3 years ago

@benhoyt, I added a note and link to the top comment. Thanks.

gopherbot commented 3 years ago

Change https://golang.org/cl/285592 mentions this issue: doc/go1.16: mention os.DirEntry and types moved from os to io/fs