Open waj opened 9 years ago
Leaning towards option 3 too. However, I remember Ary said he doesn't like warning messages as a solution because they're usually ignored anyway?
Also, option 2 means raise an exception after the second failure correct?
I'm leaning towards 1 actually, not closing the file is a clear bug and all languages that automatically close files still strongly recommend doing it manually as soon as possible, where manually can mean to use the appropriate language construct, e.g. with
in Python and the block form of File.new
in Ruby/Crystal
I actually prefer 2.
Imagine you want to return a String iterator over the lines of a file, but streaming from disk:
def stream_lines
File.open("...").each_line
end
Now there will be an open file that won't be closed, and can't be closed from outside that method because the File isn't provided. After running the program for a long time (if you invoke that method multiple times) you will get that warning/error if anything but solution 2 is chosen.
Maybe the above method isn't designed well. But I don't think the check in point 2 is very expensive: it's just checking if the return value from open
returns less then zero (which we are already doing) and in that exceptional case (which can't happen that often) run the GC.
Another problem is that you get the error after trying to open the last File. You get that error and understand: "OK, there must be some file that isn't being closed.". How do you find that file? Maybe you forgot to do it in your code. Maybe a library you are using forgot to close one. Wouldn't it be better for the program not to crash and the language to just take care of this? If later you find that the GC is being invoked a lot in File.new
for no reason, you can search where are the unclosed files and fix that, but in the meantime you got a working program.
Those never closing files are like memory leaks: they might be hard to find. The example is just a really bad api design, but if we do what Ruby does, the programmer might never realise there is a problem. In a long running process, the number of open files will hit the maximum quite often actually and it's not the check what it's expensive but the full GC call.
Also, my point is: if a file is being closed by the finalizer, there is definitely a bad design. If the language and the runtime should be there to make the programmers life easier, then it should warn as soon as possible about the problem. It has the ability to detect it but if it just tries keep the program running it's like hiding the trash under the carpet.
One might argue that this is just exactly like the memory garbage collection, but for me it's not. The files are a much more expensive and limited resource. And why am I just talking about files. It's the same, or even worse, if we think about open sockets or pipes. Any IO should be explicitly closed, otherwise it's a bug.
@waj How do you detect the file that's leaking?
Well, nevermind. Probably just find all File.new
or File.open
and trace that :-)
We could print the path of the leaked file in the finalizer. That should help a lot.
I'm not really a Ruby user too much, but I like the way Python solves it: close it in the finalizer anyway, but add a mechanism for automatic closing:
with open('abc', 'r') as f:
print(f.read())
# The 'with' block closes 'f' afterwards
It's kind of like an implicit try-finally.
@kirbyfan64 In Crystal (and Ruby) the same mechanism is available with block methods:
File.open(...) do |f|
f.gets
...
end
The file is automatically closed in this case.
In this trivial case, a simple static analysis would show that the file reference doesn't escape the loop and can be stack allocated, so its destructor can be run at the end of the basic block. Is this not currently done, or is it that are destructors only run at the method boundary?
It's not currently done. The class instances are always allocated in the heap. In this case the local variable (or the implicit reference) is allocated in the stack, so ideally with escape analysis we could destroy the object and even return the memory space to the heap before the GC runs.
Still that would be an optimisation and I don't think we should relay on such mechanism to release the resources.
Wouldn't it be better for the program not to crash and the language to just take care of this? If later you find that the GC is being invoked a lot in File.new for no reason, you can search where are the unclosed files and fix that, but in the meantime you got a working program
I don't know, I'd rather have the compiler give me an error or warning instead. As @waj says:
it should warn as soon as possible about the problem
This way I feel that the programmer's time is saved the most (overall).
@vyp Note that it won't be a compile error, it will be a runtime error
Sorry yes waj mentioned that at the start, hence "leaning".
Oh right, you meant the rust comment is useless. Correct, sorry.
Okay, so what about option 4? Except for jhass, it can do both of what @asterite and @waj wants?
Oh, now that I read everything again, I think I prefer options 3 or 4. I'm not sure which one, having your program print warnings but continue running is strange.
Yes exactly, I keep thinking 4 is just too strange/confusing too.
Actually yes, rereading, I don't think @waj would want 4. But just wait for him to respond.
@jhass, isn't 3 also in alignment with your opinion?
So far we're warning free, either you're doing it wrong or you're doing it right in Crystal. I like that.
Ah, didn't know that, thanks.
I vote for 4 (2 & 3). If it can be caught at compile time - great! If not - anything that can be saved at runtime, do it: do what the programmer intended and expected, and let the application do what the user expects: work - not crash.
If this thing would happen in some routine seldom called, implementing the heart surveillance system that your grandma is hooked up to at the hospital, you'd be glad it didn't crash because some coders decided "it's not the right way, so the coder should be punished with a crash".
The warning should of course be raised, so that it can be fixed properly for the next release.
My vote goes to 4. Programs shouldn't crash from simple oversights like this when possible.
I've tested this on Crystal 0.19.4
with OS X 10.11 ulimit -n
is 7168.
Looped over 15 million times for 5 times and haven't crashed once.
index = 0
loop do
index += 1
puts index
File.new(__FILE__)
end
crystal build --release file.cr && ./file
15277652
15277653
15277654
15277655
15277656
15277657
15277658
15277659
15277660
15277661
15277662
15277663
15277664
15277665
Using strace
on the program I see a lot of open()
and close()
syscalls. I'm guessing this is simply the finaliser running, so I don't think that this issue is solved because it's impossible to rely on the finaliser for programs with slow leaks.
I'd vote for option 3, but I wouldn't mind 2 & 3. Printing a warning is good, because explicit closing is obviously the optimal solution. It all depends on how tricky 2 is to implement.
Just rename new to open_returning, so it will be rarely used :-)
ulimit -n is 71681
does loop forever. Even with the default unlimit (256?) the error message is Unhandled exception: Error opening file '/path/to/bad.cr' with mode 'r': Too many open files (Errno)
which is reasonably clear.
The backtrace gets a bit messed up, I presume since it can't open files;
Failed to raise an exception: END_OF_STACK
[0x105d6dd0b] *CallStack::print_backtrace:Int32 +107
[0x105d473fb] __crystal_raise +91
[0x105d477ed] *raise<Errno>:NoReturn +189
[0x105d7da85] *Crystal::System::File::open<String, String, File::Permissions>:Int32 +197
[0x105d7b78f] *File::new<String, String, File::Permissions, Nil, Nil>:File +63
[0x105d63e40] *CallStack::read_dwarf_sections:(Array(Tuple(UInt64, UInt64, String)) | Nil) +40256
[0x105d59f23] *CallStack::decode_line_number<UInt64>:Tuple(String, Int32, Int32) +51
[0x105d597a2] *CallStack#decode_backtrace:Array(String) +290
[0x105d59661] *CallStack#printable_backtrace:Array(String) +49
[0x105da5128] *Exception+@Exception#backtrace?:(Array(String) | Nil) +72
[0x105da4fb1] *Exception+@Exception#inspect_with_backtrace<IO::FileDescriptor>:Nil +113
[0x105da3840] *AtExitHandlers::run<Int32>:Int32 +432
[0x105da926e] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +126
[0x105d51c59] main +9
Maybe just spit a message to stderr if it's compiled without --release
? How does java handle this I wonder...
Currently this crashes very quickly in Crystal, but it works on Ruby:
The reason: the files are never closed so it eventually hits the system/user limit. In Ruby it doesn't fail because once that happens it performs a garbage collection and tries one more time before raising an exception (https://github.com/ruby/ruby/blob/trunk/io.c#L5451-L5461). The
File#finalizer
will close the file for us and that frees the file descriptor resource.We could do the very same thing in Crystal but I'm not totally convinced by this approach. This could hide potential bugs in any application. In general, I think the
File#finalizer
should never be the point where the file is actually closed.I can see the following options:
Running the GC just in case sounds overkill, so I'm more inclined by option 3, but I wanted to hear other's opinions. Maybe I'm missing any other good reason for the Ruby implementation? (besides convenience).