golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.73k stars 17.63k forks source link

io/fs: add file system interfaces #41190

Closed rsc closed 3 years ago

rsc commented 4 years ago

In July, @robpike and I posted a draft design for file system interfaces. That doc links to a video, prototype code, and a Reddit discussion.

The feedback on that design has been almost entirely positive.

A few people raised concerns about the use of optional interfaces, but those are an established pattern in Go that we understand how to use well (informed in part by some earlier mistakes, such as optional interface like http.Hijacker with methods that cannot return an error to signal failure/unavailability).

A few people suggested radical redesigns of the os.File interface itself, but for better or worse installed base and the weight of history cautions against such drastic changes.

I propose to adopt the file system interfaces draft design for Go 1.16.

rsc commented 4 years ago

Re context specifically, I spent a long time trying different possible ways to mention context anywhere in the FS proposal and ended up at the best choice being not to. Not because context is bad or should be ignored, but because the cleanest way to integrate context seems to be at the time of FS creation.

We can't pass context to methods like fs.File.Read and fs.File.Write, because then those methods would not satisfy io.Reader, io.Writer, and so on.

We could move up one level and pass the context to fs.FS.Open, having the context saved and applied to all operations on the resulting file. It sounds like many people would be happy with that level of control. A downside, though, is that you then either have to add a context parameter to all the helper methods and functions (Glob, ReadFile, ...) as well as any function that accepts an FS (template.ParseFS, ...). Nearly all the time, that context will be context.Background(), making the calls annoyingly verbose for no real benefit. And many functions accepting FS will choose not to take a context and put in context.Background themselves. People who want to use context with those functions will be completely out of luck.

Or we could move up one more level, as @Merovius suggested, and set the context at the creation of the fs.FS and have it apply to all operations on the resulting file system. That avoids all the downsides of the previous paragraph. And all the people who would be happy with "context set during Open" should be equally happy with "context set during FS creation" provided that's a lightweight enough operation. They should also be happy that it is then impossible for people to write FS-accepting code that fails to plumb a context through: using the FS object implies use of context.

That last option - which doesn't require mentioning context in package fs at all - is the one that I recommend here, both because it seems to be the only one that actually works in practice from an API standpoint (as just noted) and because when you start getting into very fine-grained control over individual underlying network operations, you're better off using RPCs, as @ianlancetaylor said and I repeated above.

davecheney commented 4 years ago

Not because context is bad or should be ignored, but because the cleanest way to integrate context seems to be at the time of FS creation.

Thank you

rsc commented 4 years ago

@muirdm, thanks very much for bringing up wrapping and future extension methods. Your example is a really great one:

For example, I want to wrap an fs.FS to track the total number of bytes read. I need to intercept calls to fsys.Open and calls to fsys.ReadFile, if implemented. I also don't want to lose any other optional interfaces such as fs.GlobFS. Based on my experience with http.ResponseWriter, this is commonly needed, but hard and tedious to do correctly.

Absolutely true. The problem is, the wrapper-builder-helper approach more often than not replaces "hard to do completely" with "easy to do incorrectly".

As you noted, in the current proposal's API, to count bytes you need to intercept Open. That's all you need, really:

type countingFS struct {
    total int64
    real fs.FS
}

func (fsys *countingFS) Open(name string) (fs.File, error) { ... fsys.real.Open & return counting file ... }

With no changes, the current proposal makes this kind of wrapper "easy to do correctly". The problem is it may not be as efficient as you might want, because maybe the underlying FS has a ReadFile that's more efficient in some important way compared to Open+Read+Close, and the countingFS is hiding that. So you have to add a counting version:

func (fsys *countingFS) ReadFile(file string) ([]byte, error) {
    data, err := fs.ReadFile(fsys.real, file)
    fsys.total += int64(len(data))
    return data, err
}

Fair enough. But now suppose the underlying FS has a Glob that's more efficient too. Even though there's no counting involved, you need to write a wrapper just to preserve access to the fast implementation:

func (fsys *countingFS) Glob(pattern string) ([]string, error) {
    return fs.Glob(fsys.real, pattern)
}

So now we're at "easy to do correctly, a bit tedious to do completely". Having an incomplete wrapper may mean that the returned fs is not as efficient at certain operations, or it may mean that it doesn't support them at all.

At this point we might well think: that Glob method is going to get written a lot, let's write a wrapper-helper to help, along the lines of the code you posted. Suppose that wrapper-helper exists and consider what happens when we forget about a particular extension method, say ReadFile:

The same concerns happen regardless of example. Another popular example is a sub-directory file system, defined by the property:

fs.Subdir(fsys, prefix).Open(name) == fsys.Open(path.Join(prefix, name)).

This is an even clearer example of the wrapper-helper failing. Because every method on FS is likely to take a name as an argument, any method forwarded automatically by the wrapper-helper will bypass the prefix addition and therefore be buggy. Maybe that's a case where it would be clear that you would avoid the wrapper-helper from the start. But maybe not. And most examples are not that clear. They're more like the counting FS example, where the wrapper-helper seems like a good idea until you realize that there's almost always some future method that might be added that your wrapper would absolutely need to adjust the behavior of. When that happens, it's much better not to have the method at all than to have a buggy one provided by the wrapper.

That is, the wrapper-helper is a good idea only if you are very careful when you use it the first time and it never changes. But as soon as you add change into the mix—as soon as you start doing software engineering—the wrapper-helper ends up being a one-time convenience with a fine print guarantee of future buggy behavior.

rsc commented 4 years ago

@muirdm Thanks also for bringing up writable files. It's true that if we later want to add writing, OpenFile would still return File, which would mean a type assertion at an OpenFile call site when opening for writing. That seems OK to me, especially if we paired OpenFile with a helper func Create:

type WriteFile interface { io.Writer; File }
func Create(fsys FS, name string) (WriteFile, error) { ... }

This seems fine to me in part because I expect that most programs that just want a File with a Write method would use the Create helper function.

The alternative is a lot more dummy methods that have to be implemented. (Of course, you could use a wrapper-helper, but that has other problems, as noted in my comment above.)

Another reason that it seems OK to make writing a little more work is that I think this (half-joke) is not too accurate:

However, in some sense writing is more fundamental because you have to write before you can read (only half joking).

On average, files are read more than they are written: there's no point at all writing a file that will never be read, which makes the average reads per write > 1.

Writing may or may not be more fundamental but is certainly more specific. It involves giving a concrete answer to every possible kind of metadata supported by the underlying file storage - modification times, uids, compression algorithm in a zip file, and so on. If you use an abstracted writer, you give up the ability to set those and get whatever defaults are provided. Maybe that's fine, but maybe not. And you only get one chance to set all these: the time when the file is written.

In contrast, reading is easier to make abstract. A particular reader may well not care about any particular detail like modification times, uids, or compression algorithms. That's fine: it can ignore them. And another reader that does care can look at info.ModTime() or even use info.Sys() to get implementation-specific additional detail. Both can exist simultaneously for the same file: the abstract reader and the more specific reader. Using an abstract reader does not preclude also using a specific reader.

In contrast, using an abstract writer does preclude using a specific reader, because the specific bits the reader is interested in won't be there at all. So code writing files inevitably trends toward handling all the extra metadata even if there's only one reader that cares. That makes any attempt at an abstract writing API inevitably trend toward complexity, because you only get to write a file once, one way.

But since you get to read a file repeatedly, many different ways, and because most readers only care about a few details, there's much more opportunity for a simple abstract API.

All this is to say I think the design decision of limiting this proposal to read-only operations is a good one. In fact it was the key insight (by @robpike) that unlocked years of being stuck and let us make any progress defining this interface at all.

Writing files is where the dragons are.

networkimprov commented 4 years ago

The io/fs integration with stdlib, and embed.Files can all be prototyped in x/

It cannot, because stdlib cannot import x/.

Sorry I wasn't clear. I meant you can duplicate to x/ any parts of stdlib necessary to prototype embed.Files etc in x/.

Thanks for your other responses.

iand commented 4 years ago

@rsc

On average, files are read more than they are written: there's no point at all writing a file that will never be read, which makes the average reads per write > 1.

I find this statement hard to understand. Many programs write files that they never consume. A trivial example is logging. But there are also format converters, content downloaders and compression utilities as the first few that come to mind.

muirdm commented 4 years ago

@rsc Thank you for responding.

Regarding wrapping, you give a lot of good reasons to avoid passing through optional interfaces by default. I officially withdraw my example wrapper.

As you point out in https://github.com/golang/go/issues/41198#issuecomment-686618566, "opt-in" wrapping is not difficult because the wrapper can unconditionally implement the extension interface and fall back to the default implementation if the underlying object doesn't implement the extension interface. That insight makes me feel better about wrapping.

Looking forward, though, as more disjoint extension interfaces (i.e. extensions with no fallback implementation) are introduced, we may yet have to grapple with the combinatorial explosion of extension interfaces in order to maintain required functionality when wrapping. Do you see that problem as avoidable? Solvable? Inevitable?

earthboundkid commented 4 years ago

Looking forward, though, as more disjoint extension interfaces (i.e. extensions with no fallback implementation) are introduced, we may yet have to grapple with the combinatorial explosion of extension interfaces in order to maintain required functionality when wrapping. Do you see that problem as avoidable? Solvable? Inevitable?

To magnify this point a bit, I think there are two classes of optional interfaces:

  1. Those that exist for performance/efficiency reasons (e.g. io.WriterTo, fs.FileReader, etc.)
  2. Those that add new capabilities (e.g. http.Flusher, http.Pusher, etc.)

When replacing the optional efficiency methods, it doesn't make sense to do naive wrapping for all the reasons explained above by Russ. But I think that for the capability methods there need to be separate mechanisms for a) testing the existence of the capability and b) wrapping the capability. The problem for http.ResponseWriter is there is no way to wrap an http.Flusher (b) without providing a Flush method which implies the existence of a Flush capability (a). That's what leads to the combinametric explosion where you need to have WrapperWithFlusher, WrapperWithPusher, and WrapperWithFlusherAndPusher, etc. For the efficiency methods, there's no real need or reason to distinguish b from a because you can just do the inefficient thing if the method isn't there. For the capability interfaces, it's totally different.

I think in introducing new optional interfaces, this difference needs to be explicitly marked and thought through to make sure we don't have more Pusher/Flusher-style unwrappable interfaces. AFAICT, so far the fs interfaces are basically just efficiency methods, but we need to be careful that if any capabilities interfaces are added, they are added with a way to test for their existence besides just checking for the methods.

earthboundkid commented 4 years ago

Reading through the proposal again, these interfaces (which are just being mooted and not seriously planned AFAICT) are capabilities:

type RenameFS interface {
    FS
    Rename(oldpath, newpath string) error
}

func Rename(fsys FS, oldpath, newpath string) error {
    if fsys, ok := fsys.(RenameFS); ok {
        return fsys.Rename(oldpath, newpath)
    }

    return fmt.Errorf("rename %s %s: operation not supported", oldpath, newpath)
}
type OpenFileFS interface {
    fs.FS
    OpenFile(name string, flag int, perm os.FileMode) (fs.File, error)
}

func OpenFile(fsys FS, name string, flag int, perm os.FileMode) (fs.File, error) {
    if fsys, ok := fsys.(OpenFileFS); ok {
        return fsys.OpenFile(name, flag, perm)
    }

    if flag == os.O_RDONLY {
        return fs.Open(name)
    }
    return fmt.Errorf("open %s: operation not supported", name)
}

Fortunately, because both capabilities interfaces involve returning an error, there's a logical way to signal that you have the method but cannot actually use it: return some canonical error value if the fs.FS you're wrapping doesn't have the required capability.

tooolbox commented 4 years ago

Sorry I wasn't clear. I meant you can duplicate to x/ any parts of stdlib necessary to prototype embed.Files etc in x/.

Correct me if I'm wrong, but the issue with x/ is that the build command has to be aware of embed.Files. That's why x/ is out, because we can't have build importing from there. I agree with you that a prototype/trial phase would be nice, but it's just not on the table for this reason.

Unless you mean prototyping with a different Go tool binary a la go2go but I don't really see the draw in that.

networkimprov commented 4 years ago

Aren't some x/ packages vendored into the core repo for use by tools? 1.16 could provide a prototype of embedding that way.

Cyberax commented 4 years ago

This issue is about generalizing the FS API we have. It is not about designing a whole new API with bells and whistles for cancellation, deadlines, hung network file systems, and so on.

I'm not going to participate in the discussion further, but Go has this infuriating tendency: a lot of very good thought out interfaces with attention to detail and among them some brain-dead bogosities that are not extensible and have to be worked around.

Examples include: the SQL layer, deadlines in TCP, Readers/Writers, concrete classes in the log package, testing.T not being an interface, etc. Some of them are relics of the past before the modern practices have evolved, like passing a context everywhere, some are just designs that gave no thought to extensibility and evolution.

And here we have another such example. The FS API that is meant for a very narrow use-case and will be difficult to extend in future. Seriously, just pass the freaking context everywhere and be done with it. It's mentioned in the Go's own style guide for FSM's sake: https://github.com/golang/go/wiki/CodeReviewComments#contexts

If you personally don't need it NOW doesn't mean that other people won't need it later.

Cyberax commented 4 years ago

Not because context is bad or should be ignored, but because the cleanest way to integrate context seems to be at the time of FS creation.

No, it's not. I already gave an example: a filesystem with deadlines on Read() operations. Each read might have its own deadline, independent of anything else.

ianlancetaylor commented 4 years ago

No, it's not. I already gave an example: a filesystem with deadlines on Read() operations. Each read might have its own deadline, independent of anything else.

For what it's worth, the existing os.File type already provides a SetDeadline method (https://pkg.go.dev/os?tab=doc#File.SetDeadline). It doesn't seem like a big stretch to add that as an io/fs extension for file systems for which that makes sense.

mvdan commented 4 years ago

As a bystander - @Cyberax I don't agree with the way you've written multiple comments in this thread. If you intend to participate in any issue in the future, please remember the code of conduct - be respectful and thoughtful.

rsc commented 4 years ago

For what it's worth:

Go has this infuriating tendency: a lot of very good thought out interfaces with attention to detail and among them some brain-dead bogosities that are not extensible and have to be worked around.

This turns out to be the case for just about everything: some parts are great, some are not. The tough part is that people disagree about which parts are which.

rsc commented 4 years ago

@muirdm and @carlmjohnson both commented about the "combinatorial explosion of extension interfaces." I think we understand how to avoid that now:

  1. Only add extension interfaces corresponding to a single operation (Copy/ReadFrom/WriteTo is the antipattern),
  2. Make sure every extra method can return an error (http.Flusher is the antipattern).

Consider these optional interfaces: http.Hijacker, http.Flusher, http.Pusher.

The mistake in this trio is Flusher, which doesn't say anything about failure. It's nice that Pusher defines a specific error, but I don't think that's actually too important. Hijacker is fine too.

The fact that you can't have a Flush method that fails is what leads to Wrapper vs WrapperWithFlusher. On the other hand, you should never need to distinguish Wrapper vs WrapperWithPusher, nor WrapperWithFlusher vs WrapperWithFlusherAndPusher: having a Push method does not guarantee it works.

A wrapper can provide any extension methods the author knows about and forward them along to the wrapped implementation if available and otherwise implement the right fallback: for optimizations, call the generic code (fs.ReadFile, etc), and for new functionality, return an error - that functionality is not available.

The biggest anti-pattern is code assuming that the presence of a method means it is guaranteed never to fail. As long as every extension method has an error result, that problem should be avoided.

When we think about file systems, we expect every operation to be able to fail - if nothing else, there's always the possibility of "permission denied". So making the Flusher mistake here of not specifying failure behavior is pretty unlikely.

As long as we are careful to make sure all extension methods have an error result, it seems to me there is no possibility of a combinatorial explosion of wrappers. Each wrapper should implement what it knows about, and it is OK to update the wrappers as more are added. (A reasonable initial objection is "but the wrappers will need to be updated", but the answer to that is my earlier comment, perhaps supplemented with the observation that if you are depending on code that has no possibility of being updated, you may have other problems as well.)

P.S. Straying from the general case a bit into HTTP-specific details, but I'd argue that a no-op Flush method should be considered a valid implementation of http.Flusher. Given all the middleboxes that might sit between the Go HTTP code and the actual client, even flushing to the server's wire is no guarantee that the bits arrive on the client's wire, so it seems weird to be picky about the Go layers but not all the other parts along the connection. I would suggest that wrappers just always provide a Flush method, and make it a no-op if the ResponseWriter being wrapped doesn't have a Flush method.

rsc commented 4 years ago

In the discussion above, I see the concerns about wrapping and context, both of which I've replied to above and seem to have gotten positive responses. Are there other concerns about moving this proposal forward?

(Again, remember that we're not going to redo the os.File interface as part of this. That's off the table.)

earthboundkid commented 4 years ago

FWIW, I am satisfied that my concerns about wrapping are addressed for now.

jimmyfrasche commented 4 years ago

I'd like to see #41198 and #41188 settled first

Cyberax commented 4 years ago

In the discussion above, I see the concerns about wrapping and context, both of which I've replied to above and seem to have gotten positive responses. Are there other concerns about moving this proposal forward?

What are exactly the concerns that prevent Context from being just added explicitly as an argument for all methods? I haven't seen the explanation anywhere.

OK, sorry, won't comment in this thread again....

Merovius commented 4 years ago

@Cyberax Among other things the fact that the *os.File interface doesn't support cancellation and as was made clear, that API isn't going to change at this point. Given that this is the primary implementation of the proposed interface, it's definitely incorrect to add a context - it would be pretending a capability that doesn't exist.

networkimprov commented 4 years ago

Quoting https://github.com/golang/go/issues/41190#issuecomment-690814116: "There are plenty of possible file system implementations that are reliable enough to maintain the illusion of an ordinary on-disk file system, and the generalized API in this proposal is targeted at those."

IOW, in-memory data mostly.

I'd guess the Go team hasn't heard enough reports of stalled file ops in deployed apps to consider that a problem. It's most commonly seen by desktop apps on LANs with fileservers, where Go has little impact. I'm working on a desktop app, so am concerned about it; @Cyberax and @tv42 are the only others I've seen raise the issue. See also #41054.

tv42 commented 4 years ago

@networkimprov I want to see file I/O cancellation if and when the OS supports it, but I don't see that needing much more here than fsys := fsys.WithContext(ctx) in the right spot. The work to plumb that up resides elsewhere.

rsc commented 4 years ago

@Cyberax

What are exactly the concerns that prevent Context from being just added explicitly as an argument for all methods? I haven't seen the explanation anywhere.

I believe I addressed that in https://github.com/golang/go/issues/41190#issuecomment-690819876.

rsc commented 4 years ago

Waiting on #41467, which seems to be converging.

rsc commented 4 years ago

Based on the discussion above and the fact that #41467 is now a likely accept, this too seems like a likely accept.

rsc commented 4 years ago

No change in consensus, and #41467 is accepted, so accepting.

gopherbot commented 4 years ago

Change https://golang.org/cl/243909 mentions this issue: testing/iotest: add TestReader to test readers

gopherbot commented 4 years ago

Change https://golang.org/cl/243905 mentions this issue: go/build: allow io/fs to depend on time

gopherbot commented 4 years ago

Change https://golang.org/cl/243900 mentions this issue: os: use keyed literals for PathError

gopherbot commented 4 years ago

Change https://golang.org/cl/243907 mentions this issue: all: update references to symbols moved from os to io/fs

gopherbot commented 4 years ago

Change https://golang.org/cl/243906 mentions this issue: io/fs: move FileInfo, FileMode, PathError, ErrInvalid, ... from os to io/fs

gopherbot commented 4 years ago

Change https://golang.org/cl/263142 mentions this issue: all: update references to symbols moved from io/ioutil to io

gopherbot commented 4 years ago

Change https://golang.org/cl/243911 mentions this issue: os: add DirFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243908 mentions this issue: io/fs: add FS, File, ReadDirFile; move DirEntry from os

gopherbot commented 4 years ago

Change https://golang.org/cl/243910 mentions this issue: testing/fstest: new package for testing file system code

gopherbot commented 4 years ago

Change https://golang.org/cl/243912 mentions this issue: io/fs: add ReadFile and ReadFileFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243914 mentions this issue: io/fs: add ReadDir and ReadDirFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243913 mentions this issue: io/fs: add Stat and StatFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243915 mentions this issue: io/fs: add Glob and GlobFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243916 mentions this issue: io/fs: add Walk

gopherbot commented 4 years ago

Change https://golang.org/cl/243939 mentions this issue: net/http: add FS to convert fs.FS to FileSystem

gopherbot commented 4 years ago

Change https://golang.org/cl/243938 mentions this issue: html/template, text/template: add ParseFS

gopherbot commented 4 years ago

Change https://golang.org/cl/243937 mentions this issue: archive/zip: make Reader implement fs.FS

gopherbot commented 4 years ago

Change https://golang.org/cl/264057 mentions this issue: cmd/go: fix TestScript/test_cache_inputs

earthboundkid commented 3 years ago

The docs for io/fs don’t current refer to package io. It’s obvious to me that fs.File.Read is an io.Reader, but it should be in the docs for new Gophers. Is there any reason to not just define File as this?—

type File interface {
    Stat() (FileInfo, error)
    io.ReadCloser
}
robpike commented 3 years ago

This is such a fundamental interface for the package that it seems fine as is, but perhaps that is an old-fashioned perspective.

earthboundkid commented 3 years ago

Whether it has an io.Reader or not is less important than making sure new Gophers know where to read the documentation for the method, which are kind of long and subtle (eg don’t retain p).

robpike commented 3 years ago

If that's your argument, shouldn't it be,

type File interface {
    Stat() (FileInfo, error)
    io.Reader
    io.Closer
}

? I would expect after that to see a request to shorten it by a line by reverting to your version. And on it will go. So I say leave it.

In any case I'm not sure how many times people need to be told about what Read means. And if it's important, why isn't that information decorating os.File.Read and every other place?

How about documentation instead of trickery? Dress up the comment on the struct a little.

A File provides access to a single file that implements the methods as documented in the io and os packages.

Even that seems needless to me. What other methods would they be?