crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.51k stars 1.62k forks source link

Flush file output on program exit #13995

Open jgaskins opened 1 year ago

jgaskins commented 1 year ago

Feature Request

When your program exits, certain objects may not be properly cleaned up. For example, userland I/O buffers may not be flushed, so data that you wrote to a File or TCPSocket may not actually make it.

I haven't personally run into this, but it came up on the forum today and I was surprised until I considered the implementation of IO::Buffered. I feel like you shouldn't have to know about that.

It used to be the case (up until about 10-20 years ago?) that operating systems wouldn't flush stream buffers on program exit, as well, and there was a common refrain of "… and make sure you call close() before exiting your program" because of things like this. Eventually, it just made sense for OSes to clean up file descriptors before exiting.

I'm assuming that calling finalize on all objects on exit is the simplest way to do this, however that may be my own naïveté talking. I don't know if that's something provided by BDW or any other GC implementation (maybe this function?). And even if it is feasible, I don't know if there's a way to do it that takes object dependency into account. For example, I would want an OpenSSL::SSL::Socket::Client to have its finalize method called before the TCPSocket underneath it, because doing it in the reverse order could cause problems — both have their own internal buffers, so if the TCPSocket is closed first, the OpenSSL::SSL::Socket::Client can't flush its buffer to the socket.

If that is not feasible, are there other ways to ensure this for specific object types?

beta-ziliani commented 1 year ago

GC_invoke_finalizers won't cut it as it requires objects to be marked for collection:

GC_API int GC_CALL GC_invoke_finalizers(void);
       /* Run finalizers for all objects that are ready to     */
       /* be finalized.  Return the number of finalizers       */
       /* that were run.  Normally this is also called         */
       /* implicitly during some allocations.  If              */
       /* GC_finalize_on_demand is nonzero, it must be called  */
       /* explicitly.
Fryguy commented 1 year ago

Ruby flushes the IO buffers on exit, so I think we should mirror what they do. I'm not sure exactly how they do it, whether through finalizers or some other mechanism.

$ irb
> data = "stuff\n"
> file = File.open("test.txt", :mode => "a")
> File.size(file.path)
=> 0
> file.size
=> 0
> file.write(data)
=> 6
> File.size(file.path)
=> 0
> file.size
=> 6
> exit

$ cat test.txt
stuff
oprypin commented 1 year ago

I think most GC languages would consider this to be working as intended. Finalizers are only for freeing memory from indirect allocations. Nim in particular was very vocal about it (~7 years ago, anyway..)

So why would you free memory right before freeing all the memory?

Finalizers are also unreliable. Handles should just be closed explicitly (usually via a block version or in ensure)

I think it's not good to make the exit of all applications delayed just to help the few users that forget to close their handles.

Meanwhile Python, despite also having finalizers, always warns about unclosed files at the end of each unittest.

oprypin commented 1 year ago

This could be something only for debug mode - somehow print warnings about all unclosed handles at the end of the program

HertzDevil commented 1 year ago

Ruby flushes the IO buffers on exit, so I think we should mirror what they do. I'm not sure exactly how they do it, whether through finalizers or some other mechanism.

In this particular case it is File#size that flushes unwritten contents, no exit nor finalizers are involved here:

file = File.open("test.txt", :mode => "a")
file.write "abc"

File.size(file.path) # => 0
File.size(file.path) # => 0

file.size # => 3
File.size(file.path) # => 3

Ruby does that to all writable Files for over a decade. The same goes for #truncate and #flock there, although I don't know why.

Ruby does explicitly call all finalizers in ::exit as well.

jgaskins commented 1 year ago

why would you free memory right before freeing all the memory?

Finalizers aren't just for freeing memory. They do that, but more generally they clean up after themselves.

@Fryguy mentioned that Ruby flushes I/O buffers on exit, but I've also confirmed that multiple other languages flush output buffers on program exit. The following languages (with example code) all write data to disk despite using buffered I/O and never closing/flushing those buffers explicitly or even implicitly by exceeding buffer capacity:

Finalizers are also unreliable. Handles should just be closed explicitly (usually via a block version or in ensure)

I agree that file descriptors should be closed explicitly or the auto-closing block version used. "Should" as in it's the ideal thing to do, especially for long-lived processes. But losing data after it was "written" without an exceptional case (like a SIGKILL/SIGSEGV) is surprising. If languages like C, Rust, and Go (all very verbose and explicit languages) flush to disk implicitly, it's surprising that Crystal requires the developer to do it themselves explicitly.

oprypin commented 1 year ago

"Flushing buffers to disk" is now a different description than "using finalizers to flush buffer to disk"; surely C doesn't do that

Fryguy commented 1 year ago

Ah, thanks for that detail @HertzDevil. I didn't realize that size itself was also doing the flushing.

Ruby does explicitly call all finalizers in ::exit as well.

I think this is the more pertinent statement. I feel like we should follow Ruby here.

Just prior to termination, Ruby executes any at_exit functions (see Kernel::at_exit) and runs any object finalizers (see ObjectSpace::define_finalizer).

jgaskins commented 1 year ago

"Flushing buffers to disk" is now a different description than "using finalizers to flush buffer to disk"; surely C doesn't do that

Not different, just a concrete example and the most important one I can come up with. If the general case I suggested of running all finalizers has unacceptable tradeoffs, that's fine, but I hope I've made the point that at least the special case of flushing buffers is the bare minimum feature request.

straight-shoota commented 5 days ago

Since we started this discussion, the effect of IO finalizers changed significantly: Since #14882 they only close the file descriptor and no longer flush the buffer. See #14807 for details on why this is necessary.

IO finalizers don't flush anymore and thus running them at exit would not help to ensure all buffers are flushed. This still requires an explicit call to IO::Buffered#close. Furthermore, due to the reasons discussed in #14807 it should be discouraged to implement custom finalizers with flush behaviour.

For the record, there would also be no other reason to run the finalizers at exit because their only effect is closing the file descriptor which happens implictly after exit anyway.

In summary, finalizers are not a way forward for this.

straight-shoota commented 5 days ago

Re: https://github.com/crystal-lang/crystal/issues/13995#issuecomment-1826391631

@Fryguy mentioned that Ruby flushes I/O buffers on exit, but I've also confirmed that multiple other languages flush output buffers on program exit. The following languages (with example code) all write data to disk despite using buffered I/O and never closing/flushing those buffers explicitly or even implicitly by exceeding buffer capacity:

The C and Go examples involve no user space output buffering. fprintf and file.WriteString write directly to the file descriptor. Only Python seems to flush on exit via the finalizers.

jgaskins commented 5 days ago

I've changed the title of the issue because folks seem to be focusing a lot more on the use of finalizers than I intended. I suggested finalizers originally only to leave room for other kinds of cleanup and because it was the only idea I had. The body of the original issue post is about the use case of flushing file buffers.

With finalizers eliminated as a path toward safely flushing the buffers on exit as other programming languages do, what other options are there?

ysbaddaden commented 4 days ago

None. We don't keep track of IO objects.

straight-shoota commented 4 days ago

Yeah, I don't see any feasible mechanism for this either.

A theoretical solution could be collecting IO instances in a global list to close them in an at_exit hook. But I don't think that would be a worthy effort, for multiple reasons.

Another concept that might be useful for this would be a SystemExit exception (#1935), although it has other implications and probably wouldn't be a good general solution.

In the end I don't think this can be effectively handled in the Crystal runtime. IO integrity must be ensured in user code. Only there you can really know the dependencies of IO instances.

I think it's a good idea in general to explicitly close or flush IO instances. Consider flushing important data at strategic locations. That does not only help for an orderly exit behaviour, but also in other situations. A process can always end abruptly, without a proper exit or any means to handle anything.

jgaskins commented 4 days ago

I feel like a language that is otherwise as programmer-friendly as Crystal is (and even claims it in the tagline on the website) shouldn't require the programmer to avoid shooting themselves in the foot by default while doing something as basic as writing to a file on local disk.

I close files explicitly (or use File.open blocks) because I've been programming for over 30 years and it's a habit I maintain from when it was required at the kernel level on most operating systems. But most folks don't have that history and those habits. When every other language I've tested handles this for you, including C (one of the most Chaotic Neutral programming languages in common use today), it feels like table stakes.

PR incoming.

RX14 commented 3 days ago

I think keeping IO::Buffered instances in a linked list (per thread or using atomics) could be doable with minimal overhead. SystemExit is not a path I want to pursue, it's a breaking change.

I remain unconvinced where exactly this is an issue without calling exit in a random place in the code.

ysbaddaden commented 3 days ago

@RX14 I thought about the linked list, but then we'd keep live references to all IO objects and the GC won't collect lost IO objects anymore, and we'll start leaking file descriptors again. We could use weak references but then we can't use a "cheap" intrusive linked list anymore. Maybe we could manually allocate the nodes somewhere the GC won't scan...

@jgaskins there's no point to flush on exit while the GC collecting a lost IO wouldn't. This issue is intrinsically connected to the "IO finalizers don't flush" issue.

Again: they don't flush because it involves runtime that CAN'T run within the GC (details in #14807). We miss a mechanism to defer executing finalizers to Crystal, so they wouldn't run while the GC stopped the world but to after the GC resume the world, and thus along the rest of the application (after run the finalizer is removed, and a next GC collect will free the object). Only then could we start thinking of keeping track of IO::Buffered objects and have an at_exit handler to flush them.

RX14 commented 3 days ago

Maybe an IO flusher (or generic deferred work) system fiber, with finalizers pushing to a queue to be handled whenever that fiber next gets scheduled. Then the at_exit problem is still there which is huge.

Designing a brand new datastructure to keep track of all open IO::Buffered instances, which doesn't leak memory, which is performant under parallelism, isn't easy. And that's what's required to implement this.

I think the one usecase for this feature which is when you open a buffered IO as a constant or class variable, which you expect to live throughout a long-lived daemon program. Short scripts can easily use .open(&). However, most usecases I can come up with for that would be calling #flush anyway during program execution (database connections, flat file storage.)

jgaskins commented 3 days ago

I remain unconvinced where exactly this is an issue without calling exit in a random place in the code.

The current behavior is not documented anywhere that I can find even while looking for it. Even if it were, I haven't found any other programming languages where writing to a file doesn't write to a file when the program exits cleanly, so it is still surprising behavior.

There are a lot of scenarios where a program writes to one or more files and then exits. Many programs are not long-lived and exiting is normal.

straight-shoota commented 1 day ago

@jgaskins You make a good point that files should better err on the side of safety and ensure correct behaviour in the default case. If implicit performance optimization is incompatible with that, it must be opt-in.

So it sounds like a good idea to disable output buffering for files by default. Besides exit behaviour, this is also very relevant when the content of a file is read concurrently while writing to it. For example a program that writes sporadically to a log file, while another process follows log output (e.g. tail -f). It would be super

For example, running this Crystal program, while following out.txt, the full body "hello\nworld\n" is written after 10 seconds, whereas you might expect the first part ("hello\n") would be written immediately, only the second part following later.

File.open("out.txt", "w") do |file|
  file.puts "hello"
  sleep 10.seconds
  file.puts "world"
end
$ crystal build write-out.cr
$ touch out.txt
$ tail -f out.txt &
$ ./write-out
$ # 10 second delay...
hello
world

Btw. Julia Evens talks about pipe buffering with this exact use case as a premise in a recent blog post.

Uncoordinated buffering can lead to long delays which are quite often undesired. I presume that's the reason why output buffering is already disabled by default on sockets: They're used for direct communication between two parties where unnecessary delays should be avoided. With files this is less prevalent and there are many use cases where any delay is totally acceptable as long as the entire content is written eventually. However there are also many use cases where files serve for direct communication and writes should be expected to happen in a timely manner.

jgaskins commented 1 day ago

Besides exit behaviour, this is also very relevant when the content of a file is read concurrently while writing to it. For example a program that writes sporadically to a log file, while another process follows log output (e.g. tail -f).

Indeed. This was actually the original use case brought up in the forum post linked in the issue body. I proposed solving the problem of data being lost when the program exits because I assumed it would be an easier sell than solving the problem of data not being immediately consistent (and, admittedly, I'm sympathetic to eventual consistency) but, based on the discussion here, it seems like we can't solve either one without solving both.

Btw. Julia Evens talks about pipe buffering with this exact use case as a premise in a recent blog post.

Incredible post. She's got some fantastic (and timely!) content. Adding her blog to my RSS reader.

Uncoordinated buffering can lead to long delays which are quite often undesired. I presume that's the reason why output buffering is already disabled by default on sockets: They're used for direct communication between two parties where unnecessary delays should be avoided. With files this is less prevalent and there are many use cases where any delay is totally acceptable as long as the entire content is written eventually. However there are also many use cases where files serve for direct communication and writes should be expected to happen in a timely manner.

100% agreed with all of this. It's totally understandable why the current behavior was chosen with the use cases that were being considered at the time.

crysbot commented 23 hours ago

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/using-wait-groups-with-http-server/7452/4

plambert commented 22 hours ago

GNU libc flushes all open file descriptors when exit() is called. https://www.gnu.org/software/libc/manual/html_node/Normal-Termination.html

Almost all web servers allow setting a buffer size for log files; for example, if compression is enabled in nginx, it defaults to 64k. Without compression, it's recommended to use 4k or more for high traffic situations, in order to significantly reduce the wait time for the writes. Especially if the filesystem has atime updates enabled.

Crystal's approach of "data is lost by default unless the developer takes non-trivial steps to prevent it" is very surprising. I am bitten by it often.

I am curious what the IOBackend flush performance impact is in the real world. I might try to benchmark it; if I do, I'll report the results here.

straight-shoota commented 20 hours ago

@plambert As already mentioned in https://github.com/crystal-lang/crystal/issues/13995#issuecomment-2504326622, file handles in glibc (or any libc) don't have userspace buffers. The documentation of exit describes the effect of closing the open file descriptors after a process exited and doing that also flushes the kernelspace buffers. There is more on this in the manual sections Closing Streams and Stream Buffering.

HertzDevil commented 19 hours ago

The Crystal runtime does call LibC._exit instead in some places, which skips over this flush

straight-shoota commented 19 hours ago

@HertzDevil Sure. It does so only in non-recoverable error situations though. This does not constituate a normal termination and it might not be safe to do so. C behaves the same. Excerpt from Closing Streams:

If your program terminates in any other manner, such as by calling the abort function (see Aborting a Program) or from a fatal signal (see Signal Handling), open streams might not be closed properly.