Auto Closing - Githubissues

HaxeFoundation / haxe-evolution

Repository for maintaining proposal for changes to the Haxe programming language

111 stars 58 forks source link

Auto Closing #119

Open Aidan63 opened 7 months ago

Aidan63 commented 7 months ago

Many classes contain a close function responsible for cleaning up handles, native resources, or other things which are outside of the control of the GC or which may not want to be kept around until the GC runs. This proposal introduces a new autoclose keyword to variable declarations which ensures the close function is automatically called when execution of the current block ends.

autoclose final reader = File.read("input.txt", false);
autoclose final writer = File.write("output.txt", false);

while (!reader.eof()) {
    writer.writeLine(reader.readLine());
}

rendered

Aidan63 commented 7 months ago

~Hmm, don't know how I've buggered up my branches an have my int64 one in there as well... I blame thing github vscode thing I used, will try and clean it up.~

Fixed it.

Apprentice-Alchemist commented 7 months ago

Requiring a method specifically called close seems unflexible.

What about Go-style defer:

final foo = new Bar();
defer foo.destroy();

mutex.lock();
defer mutex.unlock();

Pros: not restricted to close method, can be used for more than just cleaning up resources Cons: slightly more verbose.

Python's with statement is also nice.

with(final foo = new Bar()) {

}
with(mutex) {

}

Pros: same pros as defer + more clearly defined scope and potential for implementers to have custom handling in case of exceptions (eg a database could roll back a transaction) Cons: nesting, requires explicit implementation (This one could be a macro if we implement some kind of trailing block support for macros)

Aidan63 commented 7 months ago

Not seen that go one before, I assume that defer keyword means it will run at the end of the block, not sure how I feel about the fact that you type it in one place and it gets executed in another. But I just might not be use to it.

I did think about also adding that with / C# using to the proposal, but since this is a relatively new thing in haxe I though I'd take it slow and see how people feel about the concept before adding more stuff, but I would be in favour of something like that.

Aidan63 commented 6 months ago

I've added an extra section to unresolved questions about types which don't have a close function but would benefit from this sort of transformation (a lot of stuff in the threading packge), maybe syntax like autoclose(release) final lock = new sys.thread.Lock(); if this is something we want to resolve.

Also added a part in opening possibilities about extra syntax similar to with / using for custom scopes.

Got carried away on the weekend had a go at making a proper implementation. I've got a new filter which applies the same sort of transformation as the linked macro but to the texpr. Handles classes, abstracts, forwarding, inheritance, etc, also checks the signature of found close functions to ensure they're suitable.

https://github.com/Aidan63/haxe/tree/autoclose

With that branch you can see nice code such as this.

class Foo {
    public function new() {}

    public function close() {
        trace('foo');
    }
}

abstract Bar(String) {
    public function new() {
        this = "";
    }

    public function close() {
        trace('bar');
    }
}

@:forward(close) abstract Baz(Bar) {
    public function new() {
        this = new Bar();
    }
}

function main() {
    autoclose final writer = sys.io.File.write("test.txt", false);
    autoclose final bar = new Bar();

    for (_ in 0...Std.random(10))
    {
        autoclose final foo = new Foo();
    }

    autoclose final baz = new Baz();
}

get transformed into nightmarish try catches looking like this.

private class _Main.Main_Fields_ {

    @:keep
    public static function main() {
        var writer = sys.io.File.write("test.txt", false);
        try {
            var bar = _Main.Bar_Impl_._new();
            try {
                {
                    var _g = 0;
                    var _g1 = Std.random(10);
                    while ((_g < _g1)) {
                        var _ = _g ++;
                        var foo = new Foo();
                        try {} catch (_hx_exn:Any) {
                            if ((foo != null)) foo.close();
                            throw _hx_exn;
                        };
                        if ((foo != null)) foo.close();
                    };
                };
                var baz = _Main.Baz_Impl_._new();
                try {} catch (_hx_exn:Any) {
                    if ((baz != null)) _Main.Bar_Impl_.close(baz);
                    throw _hx_exn;
                };
                if ((baz != null)) _Main.Bar_Impl_.close(baz);
            } catch (_hx_exn:Any) {
                if ((bar != null)) _Main.Bar_Impl_.close(bar);
                throw _hx_exn;
            };
            if ((bar != null)) _Main.Bar_Impl_.close(bar);
        } catch (_hx_exn:Any) {
            if ((writer != null)) writer.close();
            throw _hx_exn;
        };
        if ((writer != null)) writer.close();
    }
}

Consecutive autoclose variables could be reduced down to a single try catch but are not at the moment.

IPv6 commented 6 months ago

+1 to "defer ...", this is helpful for many cases - and not limited to classes (with very specific requirements)

back2dos commented 6 months ago

Defer would be a superb way to tackle this and also related issues, like the absence of finally. In discussions about adding the latter, concerns were raised about how easily it could be implemented across the various targets. Not sure those still hold though.

Aidan63 commented 5 months ago

Putting aide my dislike of the defer for a minute (reading the go docs certaintly didn't help, I hadn't even though of cases where the defered function might accept arguments...), I think a big advantage of restricting to a specific function or interface is that it assists with self documenting code and opens up tooling possibilities.

Restricting to something like close indicates to the user if a class or abstract has a public close it should be called when you're finished with it to ensure any resources are cleaned up in a timely manner. While defer is more flexible for existing code it doesn't really drive the eco system towards a single method for cleaning up resources, instead everyone continutes to use their own slightly different terminology and methods and its left to the user to hope libraries document what you need to call to clean things up, and I don't think its unfair to say this leads to pretty leaky code as this documentation is almost always missing.

In the C# world visual studio will run analysers to detect if a IDisposable is not in a using statement and offer code actions to wrap in a using statement. In my experience this works really well, vshaxe providing similar functionality would be really useful but thats hard to do without unifying around a single way for disposing resources.

Areas of the standard library where autoclose / with would be useful but close is not used (such as the mutex case mentioned above), it seems like it would be pretty easy to add new functions which return wrapper abstracts or similar which provide a close function.

IPv6 commented 5 months ago

To be honest the idea of having "single method for cleaning up resources" is controversial. HAXE already have established the grounds here - there is no examples of fixed naming of class methods, compiler keywords are the only place with "restrictive" naming so far. Partially this is due keywords can be translated into different implementations for different targets - while class methods are locked to have same name in every target (can be accessed via reflection, etc).

Some other considerations:

generic name like "close" will definitely break a lot of existing code. such generic name can also complicate producing code on some targets afaik
having feature linked to class is not convenient by itself, there is a lot of cases where classes are not used at all. Forcing users to wrap functionality into classes - just for closing extras - seems redundant. And definitely not good for critical code (boxing/unboxing etc)

Haxe already used as a bridge language to other local domain frameworks. So flexibility and non-restrictive implementation is the key of wide adoption, imho

back2dos commented 5 months ago

While defer is more flexible for existing code it doesn't really drive the eco system towards a single method for cleaning up resources

That's an ambitious goal, especially in light of the fact that Haxe's ecosystem includes the ecosystems of all the compiler targets.

And why of all things call it close? What about some big chunk of memory you wish to release, like a texture or what not? I mean, "disposable" is at least general enough for most cases. Yet a mutex is not really even being disposed. You may have other cases too, like this:

var something = pool.alloc();
something.thatThrows();
pool.release(something)

Here the object/resource that one wants disposed cannot dispose itself.

You raise one good point: having some explicit way to denote objects as requiring manual cleanup by the developer, so that the compiler/IDE can detect failure to do so. I would probably suggest having something like @:dispose to annotate a type (or specific method) to establish that contract, as it is far more practical for externs and 3rd party code (users can add meta data to 3rd party types via compiler arguments or macros).

Aidan63 commented 5 months ago

HAXE already have established the grounds here - there is no examples of fixed naming of class methods

There are plenty of places in haxe where certain syntax features can only be used if certain functions are provided. For loops can only be used with types which unify with any of the iterator / iterable structures, they're the reason why I proposed anything which provides a Void->Void close function (i.e. unifies with { function close() }), the precedent has been established for structural unification.

To be honest the idea of having "single method for cleaning up resources" is controversial

It's a "single recommended way of cleaning up resources that provides guarentees arounds execution", library authors are free to continue to use whatever functions they want and users can still attempt to clean up manually, but providing quality of life features around a single method means people are probably going to coverge to using it over time.

compiler keywords are the only place with "restrictive" naming so far. Partially this is due keywords can be translated into different implementations for different targets - while class methods are locked to have same name in every target (can be accessed via reflection, etc).

Keyword naming has nothing to do with target languages nor are function names locked, most generators maintain their own list of target specific keywords and will mangle any haxe code which conflicts with this. This also works fine with reflection, I can create a field called template and when compiled with hxcpp will get mangled to _hx_template since template is a c++ keyword. Using reflection to get the field with a "template" string works fine. I'm really not sure what you're on about here, but it also seems completely irrelevant to this proposal and all discussions surrounding it.

Your other comments make me thing you haven't quite grasped whats going on here or even read the proposal. I have not suggested it be restricted to a class, anything that provides a Void->Void close function can be used. So an abstract around an int with such a function would be fully usable.

generic name like "close" will definitely break a lot of existing code. such generic name can also complicate producing code on some targets afaik

I don't know what you mean by this, but again I don't think you quite understand whats happening, all this proposal does is generate a series of try catches and adds those close calls where ever the execution scope ends. I've posted a macro with this proposal which does exactly this and I'm not aware of any target specific problems. The generated code would also assumably be very similar if defer was implemented, it would still generate the try catches and insert the calls where ever ever the execution scope ends. The only breaking change is code which uses "autoclose" as an identifier, but thats also true for "with" and "defer" as none of those three are currently keywords and has nothing to do with "close" being the proposed function name.

That's an ambitious goal

I really don't think its that ambitious a goal, It's the norm for custom containers to provide iterator support to allow use with for loops since its much nicer than manually writing while loops. Having functionality that automatically calls a close function is much nicer than having to manually wrap in try catches and making sure you've accounted for all scope exits, so I'd imagine library authors would start providing close functions to allow this if they don't already.

And why of all things call it close?

Yes, there will be some cases where close is not he best terminology but I chose that because it covers all of the file, socket, io, and asys api out of the box. All wording is going to have cases where it doesn't exactly make sense, be it close, dispose, cleanup, bin_all_the_rubbish, or anything else. If people really feel that strongly about it then lets choose another one, but it feels like an exercise in pedentry more than anything.

Here the object/resource that one wants disposed cannot dispose itself.

Yes, there will be existing APIs which don't map exactly onto that but its easy enough for library authors or end users to provide wrappers around this. Again, over time I'd imagine this will solve itself and become a non issue. An exact analogue of the above code snippet could be done in C# with its memory buffers and pools which support IDisposable and there are no problems there.

using IMemoryOwner<byte> something = MemoryPool<byte>.Shared.Rent(1024);
something.thatThrows();

The memory owners dispose function will be called and the memory returned to the pool.

I still think there's good value in providing a feature based on a fixed method of cleaning resources, it makes code easier to reason about, assists with self documentation of types, allows consistent use of structural typing to represent any type managing resources, and over time API oddities will be resolved as they were in the dotnet and jvm ecosystems when they started to provided similar functionality around dispose and close functions respectively.

back2dos commented 5 months ago

All wording is going to have cases where it doesn't exactly make sense, be it close, dispose, cleanup, bin_all_the_rubbish, or anything else. If people really feel that strongly about it then lets choose another one, but it feels like an exercise in pedentry more than anything.

Sorry you feel that way. You may find that there are some developers who think that good naming goes a long way. This is your proposal, and a proclaimed goal is to unify resource handling across the eco system. I'm a bit skeptical about bin_all_the_rubbish being the thing that'll drive adoption.

I did suggested relying on metadata, because then you can shift the naming problem to the user and still have well established semantics (which is what this is really about, no?), but you can't both say that the name doesn't really matter and that there should be one name to rule them all.

And again, if you hope to standardize things posthoc, it's not something that is so easy to pull off. I mean, you're trying to address a pretty common problem, but an integral part of your solution appears to me to be that you also want everyone to solve it in the same way. Then that one way must be easy to follow and yet flexible enough to work most of the time.

Also, the design seems to skip over the fact that close may have significant side effects. What if it blocks? Should the calling thread block? Is that a good default behavior (it may well be, but that question should be examined properly)? And what if close throws? Is there any way to have the error handling in the block containing the autoclose? With defer one could easily defer attemptToClose(someFile), where attemptToClose is a user defined function that handles errors the way the user sees fit (retry/logging/etc.). It's perfectly reasonable to want something less loose than defer, but it's not enough to just brush it off. If you want more constrained semantics, you'll have to actually flesh them out.

IPv6 commented 5 months ago

Your other comments make me thing you haven't quite grasped whats going on here or even read the proposal. I have not suggested it be restricted to a class, anything that provides a Void->Void close function can be used. So an abstract around an int with such a function would be fully usable.

sorry, you are right. i got confused with "class" examples and missed the point regarding autoclose keyword.

i am still think "defer" have better fit for a problem since it is more "self-describing". User have a method for a task right here where it needed while autoclose hides the real "closing code" somewhere else. Not a problem if user the author of the codebase, but for reading code of other users "defer" is more straightforward. But i got your points, thanks for clarification

Aidan63 commented 5 months ago

Sorry you feel that way. You may find that there are some developers who think that good naming goes a long way.

Of course naming which reflects whats happening is something to be strived for and I obviously wasn't saying otherwise (dry bin_all_the_rubbish comment was needlessly dismissive though, apologies). Again, I think the long term benefits of unifying around a single cleanup functionality out weighs any slight naming oddities and API addition to use it.

I also don't think the eco system as it is right now is particually fragmented, so I don't think its quite the ordeal you seem to be implying it will be. I did a quick look through some of the most downloaded and well known haxelibs to see what they provide in cases where the user is able to clean resources.

lime & openfl, Has many media loader classes which fetch resources from file or network streams, close is used to clean these up.
haxe-ws & forks, Tries to be similar to the standard library socket types, so has a close.
format, Its readers and writers use close.
haxe-concurrent, Matches haxe threading std library so release for releasing locks and the likes.
heaps, Uses dispose a lot, but also has quite a few cases where they've opted for close instead.
tink-core, Has a disposable interface which contains a dispose function, used by what seems to be several core types.
haxe-files, Uses close for its file system watchers.

I gave these repos and clone and searched for close, dispose, clean, release, and other similar terminology. I ignored classes which were externs and ignored cases where sockets, i/o streams, or other classes from the standard library were being directly subclassed, so the authour had the choice to not use the close terminology if they so wished.

While a very small and basic sample and there are definitely outliers I don't think it paints an initial picture of a highly fragmented eco system of everyone doing things wildly differently. If the standard library uses a consistent naming for cleaning resources (which it mostly does) then users are likely to follow those conventions when building on top of it, when in Rome and all that, and that for the most part seems to have naturally occured. If extra closable related functions were added to the standard library threading types I'm sure it would be an easy and welcome addition to haxe-concurrent, but conversely Heaps might not want to move away from dispose, either because they don't want to update all their code or because they might not see any of the proposed automatic disposable techniques disussed providing much value with many of those classes holding GPU resources where their lifetime might not be limited to a particular function or block scope.

Even the cases where close was not the used terminology it was still all seemingly bog standard Void->Void functions. It's very possible there are big libraries I've missed which go completely off piste with their resource management and make extensive use of different mechanisms, and it would be good to know about them so there are concrete examples to look at.

I debated about putting notes on exceptions from close but didn't in the end didn't as I don't think there's anything special to do. If a close function blocks or throws, then it blocks of throws. Not to say I don't think thats an issue, I very much think it is, but one which exists today even if the user is manually calling close and would equally be a problem with any of the proposed mechanisms. What we should probably do though is issue advice similar to the dotnet guidelines, Where ideally close should not throw and be idempotent. I think this is good advice but ideally is also doing some heavy lifting there and I'm certain many of the existing standard library implimentations have issues with this. This is something I've brought up quite a lot recently with asys (the unknowable state of objects if closure fails) so hopefully it will be defined there going forward but I would also be keen to see effort to have some of that "back ported" to the existing std library, again I think all of the above is equally problematic both today with manual close calls and any of the mentioned automatic methods.

I'll update the proposal with some of these details.

So looking at that attemptToClose example, no, its not usable and thats by design. With the above advice in mind if the authour wanted it to be usable from autoclose / with hopefully they'd recognise they've overloaded close. Close for a file stream should be for closing file descriptors and other such cleanup which shouldn't fail. Any extra functionality around logging or retry attempts should be put in a separate function where failure is acceptable.

Apprentice-Alchemist commented 5 months ago

Close for a file stream should be for closing file descriptors and other such cleanup which shouldn't fail

Actually closing a file descriptor can return an error (though on Linux it'll still end up closed), there is a section in the man page about this.

Where ideally close should not throw and be idempotent. I think this is good advice but ideally is also doing some heavy lifting there and I'm certain many of the existing standard library implimentations have issues with this.

That would require keeping track of the closed state, since the underlying system calls are not idempotent.

Aidan63 commented 5 months ago

Yes the posix close can "fail", but as the man page states it should not be recalled and only used for diagnostics purposes as linux does not respect the unix EINTR stuff. The same is true of Win32s CloseHandle, it only really fails when passed an invalid handle. There are lots of early 2000s internet forums with people trying to figure out if it actually fails with little success. This is how libuv, dotnet, and others have closing apis which don't raise errors (i.e. doing a repo search of libuv for CloseHandle shows they ignore the returned value).

It would require modification to the existing implementations and may be difficult / impossible to completely do it given all of the targets, but any sort of consistency improvements in that area would only be a good thing. I'd definitely want to see this sort of stuff clearly defined for asys and have mentioned as such.

This is all probably getting a bit off topic now anyway, improvements to close consistency would be good even if none of the three discussed ideas are ultimately adopted and we still have to manually call it.

IPv6 commented 5 months ago

Some aditional considerations in favor of defer.

Proposed autoclose behavior - var tagging at instance creation - implying creation is also always an initialisation. But in many cases resource initialisation - and corresponding cleanup - is not directly linked to creation process, since real (functional) initialisation can be split/branched/delayed/etc according to functional logic. This is a frequent case when class owns several types of resources, required for different aspect and/or those resource usage is conditional. In some cases resource ownership can be "blurred" between components and corresponding cleanup logic became "blurred" too

defer allows to inject cleanup at later point and avoid cleanup when its not needed. defer also allows different cleanups according to real logic, aligned to execution logic. with autoclose the only option for such case is to have many “init flags” and “universal close” method ends up with heavy branched cumbersome system with inverted logic structure (relative to initialisation). In edge cases this became unreadable and error prone - and this things are hard to debug.

With defer you just have small dedicated cleanup per "resource logic usage concept". And execution bloat adds only if (and when) it is really needed. no explicit “initialisation logic inversion” needed. defer really helps in such cases.

Btw, why not have both? 1) defer “function/lambda” - auto-close call of function(params) 2) defer close “var xxx declaration” - auto-close call of closing method for the type

compiler can silently unwrap (2) as “var xxx; defer xxx.close()”, so its basically the same, imho. in (2) even cleanup method can be taken directly from declaration: "defer finalise var xxx=..." -> "defer xxx.finalise();" with some defaults for "autoclose var ..."

hughsando commented 5 months ago

Yes, not a lot of difference between "autoclose" and "defer close" for the added flexibility. Not sure I like the work 'defer' though - seems 'deferred' would be better. But if the usage already exists, seem ok. Barring that, tagging the close function with meta-data solves a few problems too.

Apprentice-Alchemist commented 5 months ago

Here's a suggestion:

class Foo {
  public function new() {}
  @:cleanup public function myCustomDestructor() {}
}

class Bar {
  public function new() {}
  public function closeWithoutMeta() {}
}

@:cleanup var foo = new Foo();
@:cleanup(closeWithoutMeta) var bar = new Bar();

cleanup is a more general/flexible name than close/autoclose
using metadata avoids introducing yet another keyword
allowing the end user to specify which method to use avoids the "this library uses dispose instead of close" issue
boilerplate can be reduced when the library opts-in (and this opt-in can be done without any API changes)