golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.04k stars 17.54k forks source link

proposal: io/fs: add writable interfaces #45757

Closed matthewmueller closed 1 year ago

matthewmueller commented 3 years ago

Go 1.16 introduced the embed and io/fs packages. The current implementation is a minimal readable filesystem interface.

This is a wonderful first step towards standardizing filesystem operations and providing the community with a lot of flexibility in how we implement filesystems. I would like us to take the next step and define a writable file system interface.

Problem

fs.FS can't be written to after it's defined.

func Write(fs fs.FS) {
  // We can't write to FS.
  fs 
}

Optional interfaces could be defined in user-land:

func Write(fs fs.FS) {
  // We can't rely on vfs.Writable being implemented across community packages.
  writable, ok := fs.(vfs.Writable)
}

But it suffers from the same issues that the readable filesystem interface aimed to solve: standardizing the interface across the ecosystem.

Use Cases

I'll list of few use-cases that I've come across since Go 1.16, but I'm sure the community has many more:

We've already seen started to see this pop up in the community around io/fs to address the problem in user-land:

A quick search on Github will yield more community libraries: https://github.com/search?q=%22io%2Ffs%22+language%3Ago. For many of these implementations, you can imagine a useful writable implementation.

Of course, there are many other file system libraries that came before io/fs that define writable interfaces like afero and billy.

Proposal

I don't feel qualified to define an interface, I know people have thought about this much harder than I have. What I would love to see from a community member's perspective is the following:

package fs

func WriteFile(fs FS, name string, data []byte, perm FileMode) error
func MkdirAll(fs FS, path string, perm FileMode) error
func Remove(fs FS, name string) error

Nice to Have: Be able to define if a filesystem is readable, writeable or read-writable.

func Open(fs fs.FS) (*DB, error)   // Readable
func Open(fs fs.WFS) (*DB, error)  // Writable
func Open(fs fs.RWFS) (*DB, error) // Read-Writable

Thanks for your consideration!

coder543 commented 3 years ago

I'm currently experimenting with writing a file format encoding/decoding/mutating package that is intended to work with files that aren't guaranteed to easily fit in memory.

I would like to implement it in terms of fs.FS, which would make it so the library doesn't have to care whether these files actually exist on the local filesystem, in memory, or stored somewhere else. In point of fact, this is intended to be an archival format that distributes the contents of a single archive over a configurable number of actual files, and these files might be distributed geographically across different regions for redundancy.

This codec package doesn't want to care about supporting all the different places that the underlying files could be stored. It just wants to take in an fs.FS and a list of paths.

Additionally, to make this package more testable, using fs.FS would make it trivial to write tests without having to actually read and write files on disk.

Unfortunately, since fs.FS is read-only, I'm sitting here thinking up complicated ways that I could support fs.FS and somehow still support actually writing and mutating files on disk.

DeedleFake commented 3 years ago

I've got a similar issue with my p9 package. It attempts to implement a 9P client and server in a high-level way, similar to net/http, and while I'd like to rework the file abstraction to use fs.FS, it currently would result in odd things due in part to Open() returning an fs.File, which is then read-only by design. For now, what I'm leaning towards is just having a function that takes my package's filesystem type and returns an fs.FS that abstracts it away, but the read-only problem will still be there.

Maybe something like this could work?

type RWFS interface {
  FS
  WriteFS
}

type WriteFS interface {
  // Create creates a new file with the given name.
  Create(string) (WFile, error)

  // Modify modifies an existing file with the given name.
  Modify(string) (WFile, error)
}

type WFile interface {
  Write([]byte) (int, error)
  Close() error
  // Maybe also some kind of WStat() method?
}

Then the returned types would only have to expose either reading or writing methods, and the interface would just handle it transparently.

Persionally, I think that it would be a lot better if there was some way to abstract away the specific requirement of an fs.File as the returned type so that either Open(string) (*os.File, error) or Open(string) (SomeCustomFileType, error), but that would require language changes and that seems like overkill. It could be partially done with generics, such as with type FS[F File] interface { ... }, but it has some odd potential complications, and it wouldn't be fully backwards compatible at this point.

dhemery commented 3 years ago

To me, this looks like the minimum requirement:

package fs

type WFile interface {
    Stat() (FileInfo, error)
    Write(p []byte) (n int, err error)
    Close() error
}

type WriteFS interface {
    OpenFile(name string, flag int, perm FileMode) (WFile, error)
}

And another (ugh) for making dirs:

type MkDirFS interface {
    MkDir(name string, perm FileMode) error
}

And some helper functions for convenience:

func Create(fsys WriteFS, name string) (WFile, error) {
    // Use fsys.OpenFile ...
}

func WriteFile(fsys WriteFS, name string, data []byte, perm FileMode) error {
    // Use fsys.OpenFile, Write, and Close ...
}

func MkDirAll(fsys MkDirFS, path string, perm FileMode) error {
    // Use fsys.MkDir to do the work.
    // Also requires either Stat or Open to check for parents.
    // I'm not sure how to structure that either/or requirement.
}
kylelemons commented 3 years ago

I think that we should lean more heavily on io and os, rather than making top-level WFile types. In particular, I think we should basically just define some top-level functions that can fall back all the way to Open, and not have a WFile interface at all.

Summary

Detail

See sketch of an implementation here:

https://gist.github.com/kylelemons/21539a152e9af1dd79c3775ca94efb60#file-io_fs_write_sketch-go

This style of implementation appeals to me because:

I think the same patterns can be used to implement Mkdir and MkdirAll on a filesystem as well.

geoffroy-perso commented 2 years ago

I think that also having a Remove method that can work with a fs.FS interface would be a great addition (similar to the current os.Remove function).

Just a small contribution as it feels to me in the scope of this proposal, and since no one mentionned file removal 🙂

sbourlon commented 2 years ago

@kylelemons just to be consistent with fs.ReadDirFile, the only File extension at the moment, and iofs draft/The File interface:

type ReadDirFile interface {
  File
  ReadDir(n int) ([]DirEntry, error)
}

type WriteFile interface {
  File
  Write(p []byte) (n int, err error)
}

There is a comment from Russ Cox about this exact change as well.

hairyhenderson commented 2 years ago

There's an interesting package https://pkg.go.dev/github.com/hack-pad/hackpadfs that's got some interfaces defined for a writable version of io/fs... The interfaces "feel" pretty much like they should, and are effectively the same as much as what's been discussed here already. I'm in no way associated with that project, just thought it was some interesting work that might inform how writable file interfaces fit into a future release.

cristaloleg commented 2 years ago

Looks like this proposal has label Proposal but isn't a part Review Meeting https://github.com/golang/go/issues/33502

@rsc can you move it into https://golang.org/s/proposal-status#active column please?

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

frioux commented 2 years ago

I haven't thought as in depth as some of the other folks here, but I came up with a similar set of interfaces and shortcuts for an FS backed by both the local filesystem and backed by the dropbox API. Here's the API (obviously not all of this is so general purpose:)

type CreateFS interface {
        fs.FS

        Create(string) (FileWriter, error)
}

type FileWriter interface {
        fs.File

        Write([]byte) (int, error)
}

type RemoveFS interface {
        fs.FS
        Remove(string) error
}

type WatchFS interface {
        fs.FS

        Watch(context.Context, string) (chan []fsnotify.Event, error)
}

type WriteFileFS interface {
        fs.FS

        WriteFile(string, []byte, fs.FileMode) error
}

func OpenDirFS(d string) fs.FS
func Remove(fss fs.FS, path string) error
func Watch(ctx context.Context, fss fs.FS, dir string) (chan []fsnotify.Event, error)
func WriteFile(fss fs.FS, name string, contents []byte, mode fs.FileMode) (err error)
bcmills commented 2 years ago

If we add a WriteFile, I suggest that it should deviate from io.WriteFile to fix what can otherwise be a subtle concurrency bug.

os.WriteFile truncates the existing file before writing, which causes otherwise-idempotent writes to race. I suggest that fs.WriteFile should instead truncate after writing, so that an idempotent file can be rewritten arbitrarily many times without corrupting its contents.

adamluzsi commented 2 years ago

Hello everyone!

Recently I encountered a similar use case where I wanted to dependency inject a filesystem, and my consumer is required to create, modify or remove files on the fsys.

I came up with the following header interface that closely follows the os stdlib:

type FileSystem interface {
    Stat(name string) (fs.FileInfo, error)
    OpenFile(name string, flag int, perm fs.FileMode) (File, error)
    Mkdir(name string, perm fs.FileMode) error
    Remove(name string) error
}

type File interface {
    fs.File
    fs.ReadDirFile
    io.Writer
    io.Seeker
}

I made an interface testing suite to cover roughly the ~127 most common filesystem interactions from a behavioural point of view. Then I made two suppliers for this interface:

The same interface testing suite tests both.

You can use them as a drop-in replacement for places where you had to use os the package. The filesystems package also supplies similar functions as the os such as Open, Create, ReadDir, and WalkDir.

I plan to use and maintain this until an easy to use replacement comes out in the std library. I just wanted to share this with you all, hoping it helps someone out.

Cheers!

zeho11 commented 2 years ago

I'd write like this:

type WritableFS interface {
    fs.FS
    OpenFile(name string, flag int, perm fs.FileMode) (WritableFile, error)
}

type WritableFile interface {
    fs.File
    io.Writer
}

func Create(fsys WritableFS, name string) (WritableFile, error)

func WriteFile(fsys WritableFS, name string, data []byte, perm fs.FileMode) error

...
chrisguiney commented 2 years ago

On top of a +1 for having the need of a writable fs interface, I'd like to enter into the conversation that files are not inherently readable or seekable. for example, files may be opened with fs.ModeAppend, or os.Stdin.

That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.

Any writable file implementation could return an error for read/seek operations...but it'd be nice to express that they're not necessary

hairyhenderson commented 2 years ago

That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.

@chrisguiney there are already io.Writer, io.Seeker, and io.WriteSeeker interfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.

fgm commented 2 years ago

That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.

@chrisguiney there are already io.Writer, io.Seeker, and io.WriteSeeker interfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.

These interfaces are for writable "things" like files, but the requirement in this issue is for a file-system-level interface, not for a "file"-level interface, so this does not seem to answer the need, does it ?

Merovius commented 2 years ago

Personally, I don't think there is a practical way to support a writable fs.FS. Or it's at least extremely non-trivial.

When it comes to reading files, OS-dependent semantic differences can mostly be papered over by handwavingly assuming the fs.FS is immutable. Once you actively encourage writing to the underlying filesystem, though, you open an uncontainable can of worms of semantic deviations between different operating systems and even filesystems on the same OS.

Questions which immediately arise are

  1. What happens if a file is opened for reading and for writing simultaneously? Windows (AFAIK) refuses this, returning an error, while most unix-like OSes will allow it. If we say it is forbidden, how do we forbid it under Linux? If we say it is allowed, how do we allow it under Windows? If we shrug and pass the buck, how are programmers supposed to deal with it?
  2. What kind of atomicity guarantees are given? When a power-loss (or the process is killed by a signal) happens during a Write, what are the allowed states for the resulting file to be in? This is entirely unspecified, even within a single OS. It heavily depends on the FS and the settings its mounted with.
  3. Similarly, what about Flush? Again, in practice it is mostly unspecified what this actually does (in the presence of a crash) but it's most likely going to need to be supported.
  4. How do renames behave if the target file exists, or source and target are the same file, or one is a symlink to/below the other?
  5. Speaking of symlinks, what about them? What about unix permission bits and what on systems lacking those? What about extended attributes? What about SELinux etc.?
  6. What about flock?
  7. How do you handle differences in allowed file names on different OSes?
  8. An interface like WriteFile might be reasonable to support, but even that is difficult to do cross-platform and even then, requires different APIs than just a simple "dump this please" to be reliable.

As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".

Of course, it is possible to just provide some API and tell the programmers that they can't actually rely on the data actually being written, so it shouldn't be used in any production use case. But that just feels irresponsible.

hherman1 commented 2 years ago

@Merovius isn’t the os package such an abstraction? Would you consider that irresponsible/not safe for prod?

coder543 commented 2 years ago

Yeah, I’m not convinced by any of that @Merovius.

The write interface doesn’t need to support every imaginable operation to be useful, and of the operations it would support, Go is well known for taking a pragmatic approach, even when it leads to correctness issues in some circumstances, and this pattern is all over the file interfaces Go already provides. Windows doesn’t support Unix permission bits, so Go just… does whatever it feels like on Windows.

Maybe we shouldn’t use Go in production? I disagree.

A lot of your questions would be up to the implementation to decide. If the implementation is transparently delegating to a real OS filesystem, the answers would mirror what we currently see in the os package… it would be nothing shocking at all. If the implementation is more abstract, it can and will do whatever seems reasonable, and people will report bugs against that package’s repo if they disagree. That’s how all interfaces work.

ianlancetaylor commented 2 years ago

@Merovius Perhaps the way to think about this is to concentrate on the cases that do not involve the operating system. After all, we already have a way to write operating system files. What may possibly be helpful is a way to view something like a zip file as a file system that we can write to. That would let code write out a tree of files as it sees fit, with the zip file system arranging for everything to flow out correctly at the end. And then the same code would work for tar.

If we take this approach, then I think the next step to ask is: do we need to support a read/write file system? Or should the file system be write only? Because I think a lot of the concerns go away with a write only file system.

So is there a use case for a read/write file system interface?

chrisguiney commented 2 years ago

I think it's generally fine to gloss over the semantics of how a write will behave on any given os/fs/hardware. Programs already vary in behavior by using *os.File and related os functions.

The real power having a standardized interfaces provides is being able to easily swap out the usages with implementations that do make semantics well defined. The biggest use case I have: testing. I need to know how my program behaves if opening a file fails, or hangs indefinitely. Perhaps testing makes for a simplistic example, but those are more well defined semantics than what you'll get from the os package. If, I can test how my program behaves to any arbitrary condition or state the fs at runtime may possibly present, it's actually easier to make a system that behaves consistently across different platforms.

As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".

While I've also shared that advice, I'd like to point out two things:

  1. it's not such a helpful thing to tell someone writing the database itself.
  2. even databases get it wrong (because fsync is that undefined)
Merovius commented 2 years ago

@hherman1 Yes, I believe the os package is such an abstraction and yes, it has many of the same problems, stemming from "writing files is subtle". See for example @bcmills comment here. Its saving grace, if any, is that it's a relatively thin abstraction. So it's possible to use it and stay relatively close to the OS (e.g. by using OpenFile in conjunction with x/sys/unix for flags). If we're trying to be as broad in applications as io/fs, the abstraction needs to be thicker, hide more of the differences and restrict itself further to the least common denominator.

So maybe that's surprising to people (and your question was rhetorical), but I do more and more think that using os to write files is questionable in production use. I think it's okay for some use cases, e.g. if you don't do it a lot or if it happens in an interactive program (so it's at least obvious and can be verified if something goes wrong). Otherwise, yes, I think people should not use os directly, but should use wrappers like renameio or an embedded DB.

@ianlancetaylor I believe people will invariably expect os.DirFS to support a read/write interface.

@chrisguiney

  1. it's not such a helpful thing to tell someone writing the database itself.
  2. even databases get it wrong (because fsync is that undefined)

For 1: Those people shouldn't use io/fs or even os, but likely use x/sys/unix especially because they have to be careful. And 2: Yes. I was alluding to that above. I think that makes it a worse idea to try and do it yourself, not a better.

In any case. I think it's probably possible to create a good abstraction. Just that it's subtle and it didn't seem that anyone has considered the subtlety involved.

hherman1 commented 2 years ago

@Merovius no my question wasn’t rhetorical, I meant it. Wasn’t sure how to make that clear here.

fgm commented 2 years ago

AFAICS point 1 is not correct: Windows supports opening a file for read and write simultaneously by passing GENERIC_READ | GENERIC_WRITE to CreateFileA.

Merovius commented 2 years ago

@fgm The way I read those docs, that depends what is used for dwShareMode, also based on what other processes have opened those files. But in any case, that list was not meant as a specific list of questions to answer, but more as a list of examples of the kinds of questions that arise when you try to create a cross-platform API for writable files. I'm sure there are answers to more than one of them. I'm less optimistic that more won't crop up over time or that we won't discover new problems over time by using whatever API we box ourselves into by adding it to the stdlib.

But maybe I'm wrong. I just wanted to express my concerns. And make clear that the problem isn't as easy as adding a Write method to fs.File.

Fryuni commented 2 years ago

As mentioned by @hairyhenderson, this issue is less about how to have a writable fs.File and more on how to have an fs.FS equivalent that allows creating such files.

I think an approach similar to gocloud.dev/blob would be a good match here:

If necessary, there could also be an fs.ReadWriteFS for file systems that do allow reading and writing simultaneously, returning an interface that aggregates both the return types above.

Example (I didn't give much though about the names):


type WritableFile interface {
  io.WriteCloser
  // Maybe other methods and interfaces
}

type WriteFS interface {
  OpenWritable(name string) (WritableFile, error)
}

type ReadWriteFile interface {
  WritableFile
  File
}

type ReadWriteFS interface {
  OpenReadWrite(name string) (ReadWriteFile, error)
}

Note that in this case a file system can implement fs.FS and fs.WriteFS while not implementing fs.ReadWriteFS if a file can only be opened in one of the modes at a time.

This also fits with the existing io interfaces (Reader, Writer, ReadWriter, etc.)

fgm commented 2 years ago

OK, I'll probably get flamed on this, but what's so wrong about something like the PHP stream wrapper feature, adapted to Go syntax, obviously ? https://www.php.net/manual/en/class.streamwrapper.php

ianlancetaylor commented 2 years ago

@fgm I think that is basically what we are talking about here.

ianlancetaylor commented 2 years ago

@Merovius

I believe people will invariably expect os.DirFS to support a read/write interface.

Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.

Fryuni commented 2 years ago

Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.

I don't understand. By server/client are you referring to the implementor/consumer of the interface?

ianlancetaylor commented 2 years ago

I don't understand. By server/client are you referring to the implementor/consumer of the interface?

Yes.

The proposal in this issue basically overlays an optional write interface over the existing read interface. I'm questioning that. Are there examples of programs that want to use io/fs and want to use both the read interface and the write interface? Or does would programs be satisfied with a io.WriteFS that supports Create to get a WritableFile that implements io.Writer?

coder543 commented 2 years ago

One example: implementing a mutable archive format. With the current read only abstraction, it is possible to let users transparently swap between accessing an actual directory on disk and accessing a directory that is compressed into an archive. But what if they want to edit a file, and my library wants to support this? I have to make them write library-specific code to accomplish this, and then they have to switch between implementations on their own. They can’t just use a standard abstraction that works for directories, archives, S3 buckets, etc.

Just creating new files and flushing content to them isn’t really enough for the use cases I envision.

It seems like most use cases would be satisfied with a relatively simple interface that allows opening existing files, creating new files, deleting files, as well as reading, writing, and seeking on any file. More advanced functions / optimizations could be layered on with optional interfaces in the future, but simple seems like the right place to start.

os.DirFS would have no trouble supporting this additional interface that offers mutability. It would then be up to libraries to decide how to implement this new interface, if at all.

Right now, fs.File already provides optional interfaces, so making Writer an optional interface too doesn’t seem like too much:

A File provides access to a single file. The File interface is the minimum implementation required of the file. Directory files should also implement ReadDirFile. A file may implement io.ReaderAt or io.Seeker as optimizations.

Then it would be a matter of defining a new fs.WriteFS that primarily extends fs.FS with create and delete functions, but possibly another Open function for handling the distinction between a read-only Open and a mutable Open, kind of similar to what OpenFile in the os package provides.

I’m sure nothing I’m saying is very original, but that’s my summary of this issue.

nightlyone commented 2 years ago

Maybe we can restrict the interface to support only writing new files (io.WriteCloser) and atomic appearance into the directory namespace on close. That would align well with object storage scenarios as well as be supportable by os layers.

To cooperate with readers, I would suggest to handle it with a pipe like API:

ReadFS, WriteFS, err := os.OpenFS(dirname string).

That allows to keep the reading part compatible and have a more complex write layering and allows for separate open method for read and write scenarios.

MMulthaupt commented 1 year ago

As someone who primarily needs this for making writing testable code less work, my view is that the interface does not need strict definitions whatsoever, and can openly admit that the details are going to be implementation-specific. I don't want to test against the intersection of peculiarities of all OS/FS permutations. I just want a glorified map[string][]byte.

What really makes this valuable (or would make valuable) is the presence of both os.DirFS and an existing "make-believe filesystem" implementation for tests (which is missing for unknown reasons) in the standard library. What @adamluzsi suggested in May would be completely sufficient. And even if it wasn't - let's say for example I needed file locks (i.e.advisory file locks on Linux and exclusive file locks on Windows) - I could always define my own interface which extends the present one, then write an implementation for runtime which redirects to os.DirFS and an implementation for tests which embeds whatever the according type that implements the "glorified map[string][]byte" would be called.

Is my perspective on this whole problem missing anything big?

Merovius commented 1 year ago

@MMulthaupt

my view is that the interface does not need strict definitions whatsoever, and can openly admit that the details are going to be implementation-specific.

Then why have an interface in the first place? You can just use os directly.

an existing "make-believe filesystem" implementation for tests (which is missing for unknown reasons) in the standard library.

testing/fstest seems to fit the bill.

noel-yap commented 1 year ago

One more use case is for unit tests for functionality that writes to disk.

Merovius commented 1 year ago

@noel-yap Shouldn't the test use the same facility the functionality uses? i.e. without a writable io/fs interface, the implementation uses os, so the test should use os as well. That is, I don't see how this is an additional use case.

DeedleFake commented 1 year ago

I think it's reasonable not to, @Merovius. The tests aren't testing whether or not the os package works. They're testing that whatever they're implementing does what it's supposed to with regards to a filesystem. It's no different from using an io.Writer with a *bytes.Buffer in a test while the actual implementation uses, say, a *os.File or net.Conn.

Merovius commented 1 year ago

@DeedleFake This is starting to get off-topic, but: In that example, is the value in writing tests which don't use *os.File, or is the value in having an abstraction for the relevant behavior that you can pass to the function? I'm arguing the latter. If you don't have that abstraction - for example, say you use ioctls - then you absolutely should use an *os.File in your test. And enabling to write the test is not a good reason to add interfaces for ioctls, if we don't need that interface for its own sake. And the usefulness of io.Writer has little, if any, to do with writing tests - it can be used to write tests using a *bytes.Buffer, but that's not why you'd use it ([edit]See also[/edit]).

If you need to test behavior of functions writing files to disk, (*testing.T).TempDir() exists and it's not particularly less convenient than a hypothetical writable os.DirFS would be, in my opinion.

adamluzsi commented 1 year ago

@DeedleFake, if you need a FileSystem interaction with writing support, but you don't need to scale out via the process model this FS interaction-based solution, then coupling with the os package might be enough for you as @Merovius suggest.

My use case requires an FS component because it would feel unnatural to represent it using an entity repository interface. But the local FS is a bottleneck in scaling with a process model, so I use an FS interface to make it possible to supply a scalable virtual FS solution to my application. This also makes testing easy because the writing support enables me to have testing arrangements in my testing scenarios using business logic that has the FS as a dependency.

MMulthaupt commented 1 year ago

@Merovius

testing/fstest seems to fit the bill.

I took a look, but it's too simple. For example, it has no equivalent to filepath.Walk(). One popular library which can do that is afero, but it seems to be on life support for most of it.

In that example, is the value in writing tests which don't use *os.File, or is the value in having an abstraction for the relevant behavior that you can pass to the function?

The value is in reducing dependence on and interaction with the environment. While interactions cannot be fully eliminated (if a test fails because the machine running the test spontaneously explodes, that's usually something out of the developer's control) any step towards it helps in making sure that tests test code instead of things which are not code. It's an idealistic argument, but it stands on valid ground. Writing an abstraction over an environmental dependence (e.g. file system) which mimics it under idealistic conditions which remove any removable environmental effects which could otherwise cause a test testing correct code to fail (e.g. another program running at the same time writing the files we are attempting to read) brings us one step closer to the ideal. It makes a great combination with the concepts of mocks, spies and stubs, but there is no good way to have those in Go, which only serves to make our need for proper abstraction even greater.

Merovius commented 1 year ago

@MMulthaupt

For example, it has no equivalent to filepath.Walk().

It is exactly what you asked for: An in-memory, writable implementation of fs.FS based on a map, for use in testing. You can use fs.WalkDir to walk it.

MMulthaupt commented 1 year ago

@Merovius We've come full circle then: fstest.MapFS could work if fs.FS (which it implements) defined the methods needed for opening writable files. As it stands, the type may be mutable, but it is not a writable file system. But then fs.FS aspires to be general purpose, and I don't see how the idea of a one-for-all file system interface can be made a reality without imposing restrictions and feature-querying to the point of complete unfeasibility. Even if we get it, I am doubtful we will live to see its implementations, of which there would need to be many, even without counting any of the use-cases OP is mentioning.

@DeedleFake

The tests aren't testing whether or not the os package works. They're testing that whatever they're implementing does what it's supposed to with regards to a filesystem

I'd like to offer a different perspective on this; perhaps this is what you meant and I'm just taking this too literally. Testing whether package os works on top of your own code is not an error or otherwise a bad move. The opposite is the case: if there are behaviors in os which cause a test to fail which then leads you to finding a bug in your code, and this behavior may have been omitted from a potential os abstraction, then that's a huge win. After all tests are green, your code is "ready for production", where os will be used directly. The problem with using os directly is that it exchanges a lot of hot kisses with the environment, such that you can enter situations where a behavior of os causes a test to fail which however was not triggered by a bug in your code. Thus you enter a bug hunt in a spider's web and wonder why there seems to be nothing in need of being squashed. That's why @Merovius suggests using os directly. A more contrived situation where using os directly could cause a correct test testing incorrect code to pass successfully when it shouldn't also is conceivable.

Then why have an interface in the first place? You can just use os directly.

Perhaps the biggest issue with doing what you suggest is that you cannot proceed in good conscience without taking care of isolation. If the code you are testing involves causing data loss (intentionally or unintentionally) then you don't want it running directly on your machine or anywhere else which cannot be easily replicated by clicking a button in your CI/CD suite of choice.

I digress. What I am on about is different from the general purpose file system interface this issue is about. Unless there are other factors which I have missed, I rest my case.

Merovius commented 1 year ago

@MMulthaupt We indeed came back full circle, because we are apparently talking past each other.

So I don't understand why we're back here.

chrisguiney commented 1 year ago

I said this is not a new use case

I think I've lost the thread here -- can you please clarify: new in relation to what? go 1.20, or this proposal?

a) need to use a writeable interface in the code under test (i.e. an existing use case) or

I'm probably misunderstanding you, but this proposal exists because the existing fs.FS interface doesn't support a writable file system.

Are you trying to say that existing code being tested would already need to be accepting an io.Writer or similar interface? I'm having a hard time grokking what you're trying to communicate.

b) only need a way to populate a read-only fs.FS to pass to the code under test.

  • You claimed that the stdlib lacked a way to do the latter, which is wrong, as fstest.MapFS is that.

I don't believe this is correct.

There's also a lot of value in being able to test how code that uses a writable filesystem might behave in the face of different errors. e.g., testing what occurs when creating a new file fails, or if Write([]byte) (int, error) returns an error

Merovius commented 1 year ago

@chrisguiney

Really, this is kind of escalating from a pretty minor point. It's not even really worth discussing, but people keep asking me to clarify it, or misunderstanding it, so I feel compelled to keep arguing about it.

I think I've lost the thread here -- can you please clarify: new in relation to what? go 1.20, or this proposal?

This discussion. "Being able to test a function" is not a reason to create an interface. The value io.Writer adds isn't that it allows you to test a function writing to a file - it's that it allows you to write a function that can write to non-files just as well as files.

This seems like a fairly clear statement, even if it is a minor point. And "don't write interfaces just so you can use them in a test" is also pretty well-established advice and policy in the Go ecosystem.

I don't believe this is correct.

I don't know what to say to this. You quoted me saying "only need a way to populate a read-only fs.FS" and then made some statements about how fstest.MapFS is read-only. I don't understand how I'm supposed to defend the statement that fstest.MapFS allows you to pass a populated read-only fs.Fs to a function, because it seems like an extremely self-evidently correct statement.

There's also a lot of value in being able to test how code that uses a writable filesystem might behave in the face of different errors.

I do agree there is some value in being able to inject failures into filesystem interactions. This is just the general case for mocking. Whether or not mocking is a good testing strategy is a recurring, contentious debate and I don't think we will successfully litigate it here.

However, we don't need to. You can write a mock version of os.Open or os.Create just as well as you can write a mock fs.FS.

MMulthaupt commented 1 year ago

@Merovius

So I don't understand why we're back here.

We're back here because of three reasons:

  1. We had different understandings of what was meant by "glorified" when I said I wanted a "glorified map[string][]byte".

In my use case, during tests, both the code being tested as well as the code of the test need to perform reading and writing operations to the same file system. That means I need an Open() method which can return writable files, as @chrisguiney points out, which neither fstest.MapFS nor os.dirFS have. You are correct in saying I could build that; however I had hoped to be able to use existing standard library code.

  1. I prioritize isolating tests from the environment.

  2. You prioritize idiomatic code.


Again, digression. This issue is about a generic file system, which may have new concepts such as a mechanism to query the available permission model, the guarantees provided and not provided by file locks, even whether the Kernel code will silently drop IO errors, etc.

I don't currently need that. I am working with os today. So the abstraction I need is an abstraction oriented around os.

msiebuhr commented 1 year ago

While it does seem there is a lot of discussion around testing, I have thought about a different use-case; a minimal writable file interface for simple applications;

Inspirations could be FUSE or Gnome's GVfs that supports quite a few schemes.

My particular use-case is messing around with quite large Git repositories, where I need to do fairly small automated edits; I'd love to have something with filesystem-like interface for abstracting away "sparse" git-checkout business:

// Open Git repositoritory
git, _ := gitfs.Open("git@github.com/msiebuhr/foo.git")
defer git.Close()

// Write a file - the exact interface isn't important to me.
f, _ := fs.Create(git, "some/filename.json")
f.Write(`{"some": "json"}`)
f.Close()

// Commit and push!
git.Commit("Update some json")
git.Push("some-branch-name")

And by extending a writable fstest.MapFS, I'd be able to test the business logic without having to stub out a "real" git repository (which is also becoming a pain-point of it's own).

Hyrum's Law: With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

hairyhenderson commented 1 year ago

My particular use-case is messing around with quite large Git repositories

IMO this is where writable io/fs interfaces come in handy.

To add some personal flavour, while developing go-fsimpl I toyed a few times with adding write functionality to gitfs and other filesystems, but haven't (so far) due to the lack of appropriate interfaces in io/fs. It seems awkward to provide compatible read-only filesystems, but to bolt on write support in a way that I'd likely have to change later.

That being said, it doesn't feel that awkward, and I may yet just experiment with the interfaces defined by hackpadfs, which I believe are effectively what we need.

As to the writable fstest.MapFS discussion, maybe try out hackpadfs/mem for a taste of how it could work?

Merovius commented 1 year ago

As a suggestion: Instead of having a writable fs.FS, the corresponding "sinks" could also read from an fs.FS. That's a bit similar to implementing io.ReaderFrom instead of io.Writer.

So e.g. zip.Writer could get a method func (w *Writer) ReadFrom(fsys fs.FS) error. The advantage is, that such an fs.ReaderFrom could easily decide itself which attributes and file types it supports under which semantics and it wouldn't have to worry about nailing down what happens with concurrent or interrupted writes.