Weak references - Githubissues

HurryStarfish commented 5 years ago

In bmx-ng/maxgui.mod/issues/27, @GWRon suggested that weak references could be a useful addition to the language. A weak reference is an object references that doesn't prevent the garbage collector from collecting the object; this can be useful for things like event handling or caching. Apparently, BoehmGC makes this achievable quite easily and with little code, so here's an implementation:

weakref.bmx:

SuperStrict
Framework BRL.Blitz
Import "weakref.c"

'Private ' <- this doesn't work atm (bug with generics)

Extern
    Function bbRegisterWeakPtr(weakPtr:Byte Ptr Var, obj:Object)
    Function bbUnregisterWeakPtr(weakPtr:Byte Ptr Var)
    Function bbWeakPtrGetObject:Object(weakPtr:Byte Ptr Var)
End Extern

Public

Type TWeakReference<T> Where T Extends Object Final

    Private

    Field objPtr:Byte Ptr

    Method New() End Method

    Public

    Method New(obj:T)
        ? Debug
        If Not obj Then Throw New TNullObjectException
        ?
        bbRegisterWeakPtr objPtr, obj
    End Method

    Method Delete()
        bbUnregisterWeakPtr objPtr
    End Method

    Method Get:T()
        Return T(bbWeakPtrGetObject(objPtr))
    End Method

End Type

weakref.c:

#include <blitz.h>

void bbRegisterWeakPtr(BBOBJECT** weakPtr, BBOBJECT* obj) {
    *weakPtr = obj;
    switch (GC_general_register_disappearing_link(weakPtr, obj)) {
        case GC_SUCCESS:
            break;
        case GC_DUPLICATE:
        case GC_NO_MEMORY:
        case GC_UNIMPLEMENTED:
        default:
            bbExThrowCString("Weak pointer creation failed");
    }
}

void bbUnregisterWeakPtr(BBOBJECT** weakPtr) {
    GC_unregister_disappearing_link(weakPtr);
}

static BBOBJECT* GC_CALLBACK DerefWeakPtr(BBOBJECT** weakPtr) {
    return *weakPtr;
}

BBOBJECT* bbWeakPtrGetObject(BBOBJECT** weakPtr) {
    BBOBJECT* obj = GC_call_with_alloc_lock(DerefWeakPtr, weakPtr);
    return obj ? obj : &bbNullObject;
}

Usage: Create a weak reference to any object by creating an instance of TWeakReference<T> using the constructor New(obj:T). You can then use the Get:T() method at any time to get a regular (strong) reference to that object back. Weak references will not prevent garbage collection of the object they refer to. So when the object is about to get collected, the GC will disconnect the weak reference, meaning Get() will return Null (this happens before finalization, so Get() will only ever return either a fully intact, unfinalized object or Null). So the correct way to use it is by storing the result in a variable and checking it against Null before working with the object:

Local obj:TMyClass = myWeakReference.Get()
If obj Then
    ' do stuff with obj here
    ' since obj is a strong reference, the object can't get collected during this, so this is perfectly safe
Else
    ' the object has been garbage collected, thus obj is Null
End If

Take care not to do this:

If myWeakReference.Get() Then myWeakReference.Get().SomeMethod()

because even if the first Get() call returns an object, the second one might return Null.

Here is a a little program to test this feature. On my machine, the TTest object referred to by test gets collected after about 20 iterations. Compile this in release mode; I've noticed that the object might never get collected in debug mode, presumably because debug data keeps holding a reference to it somewhere.

SuperStrict
Framework BRL.StandardIO
Import "weakref.bmx"

Local test:TWeakReference<TTest> = GetWeakTTest()

Repeat
    DoStuff test

    ' put some pressure on the GC to trigger collection of the TTest object after a while
    Local _:Object[10000]

    Delay 200
Forever

Function DoStuff(test:TWeakReference<TTest>)
    ' retrieve the TTest object and do something with it
    ' when the GC decides to collect it, Get() will start returning Null
    Local t:TTest = test.Get()
    If t Then
        t.f :+ 1
        Print "test.Get().f = " + t.f
    Else
        Print "test.Get() = Null"
    End If
End Function

Function GetWeakTTest:TWeakReference<TTest>()
    Local test:TTest = New TTest
    ' no strong reference to the TTest object is left after this function returns,
    ' so it can be garbage collected anytime
    Return New TWeakReference<TTest>(test)
End Function

Type TTest
    Field f:Int
    Method New() Print "TTest.New" End Method
    Method Delete() Print "TTest.Delete" End Method
End Type

I haven't made this a pull request and would hold off on including it in BRL.Blitz for now for mainly two reasons:

I'm not yet sure whether this is safe to use in multithreaded environments. It should be, assuming that GC_general_register_disappearing_link and GC_unregister_disappearing_link are thread safe, but I haven't tested it.
Generics are still too unfinished to be part of the core modules imo. This (very simple) use case mostly works, but it still has its issues, such as the fact that it isn't currently possible to make the Extern C functions private.

DivineDominion commented 5 years ago

I agree with you on (2), but still think this is a valuable next step to create complex object graphs without retain cycles!

HurryStarfish commented 5 years ago

Though you almost never need to care about having cycles, so I wouldn't recommend using this unless you actually have a specific reason to. (The concrete motivation for adding it was to allow for writing weak event managers.)

DivineDominion commented 5 years ago

The exact same thing I want to do. I wrote macOS/iOS apps the past couple of years and got to like the pattern where e.g. controls weakly reference their delegate.

GWRon commented 5 years ago

Funny how two Germans discuss in English :-)

BTW @DivineDominion I think your Mac knowledge could help NG on this side of the OS support. Brucey has a good knowledge of it (as he sits in front of a Mac) but your more indepth knowledge could help with certain aspects (eg the fullscreen-mouse-inverted-issue at https://github.com/bmx-ng/brl.mod/issues/79 - which seems to have to do with OGL on Mac Mojave).

@HurryStarfish Do you think there is a more "beginner friendly" syntax/way-of-doing for weak pointers? Some automatism which fits into the BlitzMax style? All these "advanced" functionalities become "beasts" syntax wise - of course your POV might differ if you are used to such code lines.

Edit:

Local obj:TMyClass = myWeakReference.Get()
If obj Then
    ' do stuff with obj here
    ' since obj is a strong reference, the object can't get collected during this, so this is perfectly safe
Else
    ' the object has been garbage collected, thus obj is Null
End If

This is easy to read - yet it of course lacks the information whether the obj is a strong reference or a weak one. It is only important if you read someone elses code and need to know if the object referenced there is "safe" or if it is a weak reference and if nothing else holds a strong reference it gets collected then.

Function DoStuff(test:TWeakReference<TTest>)

This requires knowledge of generics which is a rather advanced feature for BlitzMaxers.

Vielen Dank fuer Eure (Diskussions-)Teilnahme am Projekt. Es ist schoen, dass das Projekt immer mehr Aufmerksamkeit bekommt :-).

DivineDominion commented 5 years ago

I'm afraid I cannot help with the OGL issue as I have 0 experience with native game development on Mac, only regular apps. Maybe I find something else :)

I think it's fair to say this is intimidating: Function DoStuff(test:TWeakReference<TTest>)

But I'd like to argue that this is a good thing, because grasping weak references may go over the heads of newcomers for quite a while. This is not meant to be a patronizing argument: I don't want to encourage making complex concepts hard to read in code to create a barrier. But unless we change the BMX syntax to support a Weak keyword of sorts, what other option is there except generic wrappers?

Maybe extending the crosspiler/compiler or whatever is at work is feasible. The easiest thing is to add e.g. UnsafeUnretained (I borrow the terms from Objective-C and Swift here) which basically doesn't increase the retain count of objects when you reference them. If the retain count is <1, trying to access an object this way should raise a runtime exception. The Weak keyword would add a check and return Null instead of crashing.

HurryStarfish commented 5 years ago

@GWRon

Funny how two Germans discuss in English :-)

Well, it makes the conversation accessible to other people, so I think that's the best choice. After all, someone else like, say, Brucey, might want to be able to read this too. :)

Do you think there is a more "beginner friendly" syntax/way-of-doing for weak pointers? Some automatism which fits into the BlitzMax style?

Looking at other languages like C# or Java, they do it the same way: they provide a WeakReference class with methods you can use, but no extra "syntactic sugar" for it. This makes sense imo: weak references are a feature that's rarely needed, and beginners probably shouldn't be handling them. Providing special syntax for it would suggest that it is a commonly used basic feature.

This is easy to read - yet it of course lacks the information whether the obj is a strong reference or a weak one.

The code tells you that obj has the type TMyClass, which means it's a normal, strong reference. Only a TWeakReference<TMyClass> would be weak*. To use an object held by a TWeakReference, you need to call .Get on it, at which point you're automatically creating a normal reference, so as long as you check that for Null before using it, you're always safe. If you want, you also always have the option of prefixing your weak reference variable names with "weak" or the like. (It's similar to pointer variables - I usually put Ptr in their names, as you can see in the code above.)

( Technically, that's still a strong reference, to an object representing* the weak reference, but that detail doesn't really matter.)

This requires knowledge of generics which is a rather advanced feature for BlitzMaxers.

Hopefully not in the future :slightly_smiling_face: Honestly, I think generics should come to be considered a basic feature in the future, and also known to beginners. In a sense, they already are: arrays, after all, are considered a pretty basic feature and are widely used. And "array" really is nothing else than a generic class. They do have a special syntax, but instead of Int[], the syntax could just as well have been Array<Int>, and in some languages it is. (Same goes for pointers - Int Ptr could as well have been Ptr<Int>.) I'd like to work on improving/completing generics in the future, and after that, BRL.Collections, and I'd hope/expect that eventually, when it's finished, using a TList<Int> is considered equally basic and common as an Int[].

HurryStarfish commented 5 years ago

@DivineDominion

The easiest thing is to add e.g. UnsafeUnretained (I borrow the terms from Objective-C and Swift here) which basically doesn't increase the retain count of objects when you reference them. If the retain count is <1, trying to access an object this way should raise a runtime exception.

This is not really applicable here, since unlike Swift, BlitzMax-NG doesn't use any reference counting. (Legacy BlitzMax did, but only in single-threaded mode, and NG doesn't at all.) We use mark&sweep garbage collection, for which cycles between objects aren't a problem. So there's normally no reason to avoid them. You really only need weak references when, for some reason, you want to keep a "reference" to an object without influencing its lifetime - which is restricted to special cases like the afromentioned weak event pattern and is something that mainly library/module authors will come in contact with, rather than "regular" users.

GWRon commented 5 years ago

Removed ... Email replies do not support Markdown

GWRon commented 5 years ago

And "array" really is nothing else than a generic class. They do have a special syntax, but instead of Int[], the syntax could just as well have been Array, and in some languages it is. (Same goes for pointers - Int Ptr could as well have been Ptr).

Hmm... as Blitzmax is a "basic" language thing it should maybe be a wordy syntax like Tlist of Int or array of TMyObject. Regarding arrays the syntax Int[size] is possible. Dunno how it is with generics.

Regarding basic and advanced feature... Just wanted to make you think about it with a different POV. Just to avoid seeing "known from other languages" syntax as the way to go too / default syntax for a feature. BlitzMax targets "average joe" developer while being powerful enough for advanced usage. As you already mentioned many of the advanced features are most probably hidden in modules/libraries. Nonetheless people should be able to "gasp" what is behind a certain syntax. BTW you are right with "arrays" being something every developer knows - but here it helps that an "array" is a word with a "imaginable meaning". You can't say that about "generics".

Local obj:TMyClass = myWeakReference.Get()

Forget about what I've said about that - I mixed something up in my mind (way too little sleep thanks to syntaxbomb.com's coding competition). I thought the obj:TMyClass is containing the weak link - while of course the myWeakReference is containing it. Sorry for the confusion.

HurryStarfish commented 5 years ago

Just wanted to make you think about it with a different POV. Just to avoid seeing "known from other languages" syntax as the way to go too / default syntax for a feature.

you are right with "arrays" being something every developer knows - but here it helps that an "array" is a word with a "imaginable meaning"

It is, but then again the word "array" isn't anywhere in the syntax Int[] either. So it's a syntax that people still just learn at the start, and I don't think TList<Int> would be much different in that regard. It's true that having a syntax similar to other languages won't help a total beginner. But it will can help people that have seen the feature in other languages - or people that might learn another language afterwards. That said, I'd never want to copy a syntax to BlitzMax just "because other languages do it like this" - I always try to consider how something will fit in with the rest of the language (is it readable, can it be parsed without getting ambigous, does it match existing syntax), but in this case, I find the common <> to be the best option.

as Blitzmax is a "basic" language thing it should maybe be a wordy syntax like Tlist of Int or array of TMyObject.

VB.Net does something like that, but it is more verbose and still needs parentheses to avoid ambiguity (especially with multiple type parameters). So imo it ends up as overall harder to read, and so I am not a fan of that syntax. Plus, my point of view is that BlitzMax's style is to walk a middle ground between classic basic-style verbosity/simplicity and certain other languages' terseness/power. Like how we use :Int (compared to VB.Net's As Integer) and we support pointers (which VB.Net doesn't).

The <> syntax also makes much more sense when you think of generics as "type functions". You can look at it like this:

Sqr, for example, is a "normal" function that works with values: when you call it, you pass in a value (some number) and it returns a different value (the square root of that number).
Pointers and arrays can also be considered a sort of function: again, you pass in a value (the index) and get out a different value (the corresponding element) - except that these "functions" are called with [] instead of ().
And the same goes for generics. Except they don't work with values, but with types: for example, a generic TList is like a function where you pass in a type (the one that you want a list of, let's say Int) and it returns a different type (the "list of Int" you wanted) - and this kind of function is called with neither () nor [], but with <>.

Personally I find this a useful way to explain and understand generics, and it shows that they're not really some scary and exotic weird thing - they're basically just a "higher level" of function that gets "called" by the compiler while it compiles your code, instead of by the program when you run it.

HurryStarfish commented 5 years ago

Oh, you know how the parameters of regular functions are declared with types, to restrict the kinds of values you can pass to them (for example, Sqr only accepts Doubles)? Generics can do that too - their type parameters can have "type types", so to speak. It just uses a different syntax. That's what Where clauses are :)

So to break it down: in the declaration Function Sqr:Double(x:Double)

Function is the keyword for the declaration
Sqr is the name of the function
x is the name of the parameter
:Double (the 2nd one) is the type of that parameter
and :Double (the 1st one) is the return type

And in the declaration Type TWeakReference<T> Where T Extends Object

Type is the keyword for the declaration
TWeakReference is the name of the "function"
T is the name of the parameter (aka the "type parameter")
Extends Object is the type of that parameter
and there is no syntax for a return type (because you can't store that in a variable anyway)

DivineDominion commented 5 years ago

@HurryStarfish I didn't experiment with bmax pointers, yet. Couldn't you bypass the C code using those? If so, unless this proper backing is implemented, people (like me) could at least have a single copy & paste-able file for their projects :)

HurryStarfish commented 5 years ago

There are certain things you're not allowed to do in pure BlitzMax (like casting a pointer to Object), so C code is required for this. But even if you could implement the whole C part in BlitzMax, you'd have to Extern-import all the GC functions and you'd end up with uglier code. For now, you can use this by copying weakref.bmx and weakref.c into your project (and importing the former), maybe later on it might get added to BRL.Blitz.

bmx-ng / brl.mod

Weak references #108