dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Weakreference for many small objects #28431

Open MkazemAkhgary opened 5 years ago

MkazemAkhgary commented 5 years ago

What I need:

There is no way to have something like WeakReference<List<T>> where the list itself has strong reference but all of its content have weak references.

This would be useful for having many small objects with weak references. List<WeakReference<T>> provides little to no benefit and may even have bad outcomes since objects are too small and weak reference may take same or more space than the target itself.

In short, I want a list with strong reference on it self but weak reference for its contents.


What I'm trying to do in case you are wondering:

I want to implement flyweight pattern for many small immutable objects.

for interning objects, an object it is retrieved or added to the dictionary (using ConcurrentDictionary.GetOrAdd). but the problem is that these objects never get cleared.

I wish i could have a collection like WeakReference<ConcurrentDictionary<K, V>> where the dictionary itself has strong reference but its content (keys/values) have weak references so that entries get cleared by GC when there is no strong reference to those interned objects.


If there was a way to tell GC to not count references as strong references in a particular array, problems would be solved. for example it could be denoted by an attribute.

class MyClass
{
    [WeakReferenceContent]
    object[] pool; // pool has strong reference, but referenced items inside it are treated as weak reference.
}

Or better handled by special type.

Clockwork-Muse commented 5 years ago

... for many small immutable objects

How small? Would they be better implemented as structs? (In which case you wouldn't need a pool)

MkazemAkhgary commented 5 years ago

@Clockwork-Muse Thanks. I thought about that too but they are not fixed size. length can vary from 3 bytes to more so immutable array is used plus an integer that is pre-computed hashcode of that immutable array. how ever smaller objects will repeat a lot whereas larger objects are unlikely to be equal.

danmoseley commented 5 years ago

If I understand right, you want List<WeakReference<T>> but the WeakReference<T> type has too much overhead in memory, and since WeakReference<T> does not hold any extra state (at least on the managed side) you would like a way to indicate that everything in an array should be treated as weakly referenced? Yes, there is no way to do that I think. It would need a fair bit of runtime work.

In your scenario, could you group them in smaller pools, so you could have several WeakReference<List<T>> - when none of the entries in that pool have a strong reference, they and the list would be collected.

MkazemAkhgary commented 5 years ago

@danmosemsft Nice idea. never thought about it. seems like the best solution with current limitations. I think grouping them by fraction of the hashcode would work.

ConcurrentDictionary<int, WeakReference<ConcurrentDictionary<T, T>>>

where dictionary is mapped by 1 or 2 bytes of the hashcode of T. I would dedicate 1 bit of the hashcode to denote if T is interned or not. (and i must exclude this bit from GetHashCode as well)

finalizer of T will then check if finalizing object is the interned one or not. if it is interned, it will then remove itself from the pool.

danmoseley commented 5 years ago

You might have to deal with fragmentation somehow - do periodic passes and coalesce them. Depending on what you're doing, you might have generations, and do the passes over the groups that you handed out longest ago.

Another idea is a bounded set that are strongly referenced, and handed out in preference to the weakly referenced ones. Those at least would be stored more compactly.

I'm making all this up though, hard to say much without detail of your usage patterns and all kinds of things.

svick commented 5 years ago

What about using GCHandle with GCHandleType.Weak instead of WeakReference<T>?

MkazemAkhgary commented 5 years ago

Unfortunately there is no way to achieve this. smaller WeakReference<ConcurrentDictionary<T, T>> pools always get cleared. It actually makes sense because the collection has only a weak reference.

I also tried GCHandle.Alloc with GCHandleType.Weak and target collection gets cleared even without calling Free.

I don't know if WeakTrackResurrection would help if I somehow implement object resurrection via a custom collection finalizer.

svick commented 5 years ago

@MkazemAkhgary I meant having a collection of GCHandles instead of a collection of WeakReferences.

MkazemAkhgary commented 5 years ago

This is my current implementation that doesn't work. Target of GCHandle becomes null before calling Free

internal sealed class ObjectPool<T> where T : class
{
    private readonly ConcurrentDictionary<int, GCHandle> _pool;
    private readonly IEqualityComparer<T> _equalityComparer;
    private readonly object _sync;

    public ObjectPool(IEqualityComparer<T> equlityComparer = null)
    {
        _pool = new ConcurrentDictionary<int, GCHandle>();
        _equalityComparer = equlityComparer;
        _sync = new object();
    }

    private int GetKey(T item)
    {
        // combine 4 bytes to single byte.
        var hash = item.GetHashCode();
        return (((hash & 0xFF) ^ ((hash >> 16) & 0xFF)) ^ (((hash >> 8) & 0xFF) ^ ((hash >> 24) & 0xFF)));
    }

    private bool TryGetTarget(GCHandle handle, out ConcurrentDictionary<T, T> target)
    {
        lock (_sync)
        {
            target = null;
            if (handle.IsAllocated)
            {
                target = (ConcurrentDictionary<T, T>)handle.Target;
                Debug.Assert(target != null); // this will fail !
                return true;
            }
            return false;
        }
    }

    /// <summary>
    /// interns the given object and returns the interned reference.
    /// </summary>
    public T Intern(T item)
    {
        var handle =_pool.GetOrAdd(GetKey(item), 
            _ => GCHandle.Alloc(new ConcurrentDictionary<T, T>(_equalityComparer), GCHandleType.Weak));

        return TryGetTarget(handle, out var target)
            ? target.GetOrAdd(item, x => x)
            : Intern(item); // try to intern again.
    }

    /// <summary>
    /// release the given object if its interned.
    /// </summary>
    public void Release(T item)
    {
        var key = GetKey(item);
        if(_pool.TryGetValue(key, out var handle)
            && TryGetTarget(handle, out var target)
            && target.TryGetValue(item, out var x)
            && ReferenceEquals(item, x)
            && target.TryRemove(item, out _)
            && target.Count == 0)
        {
            lock (_sync)
            {
                _pool.TryRemove(key, out _);
                handle.Free();
            }
        }
    }
}

And this is the class that is being interned. finalizer of these objects are called with GCHandleType.Weak mode but fail miserably :(

public abstract class Message : IEquatable<Message>
{
    /// <summary>
    /// Gets entire content of message as immutable array.
    /// </summary>
    public ImmutableArray<byte> RawData { get; }

    /// <summary>
    /// precomputed hash code for immutable array. LSB is reserved for intern flag.
    /// if LSB is 1, object is most likely interned, if LSB is 0 object is definitely not interned.
    /// </summary>
    private int _hashCode;

    private protected Message(ImmutableArray<byte> rawData)
    {
        RawData = rawData;

        // calculate hashcode.
        _hashCode = ObjectExtensions.CombineHashCodes(
            GetType().GetHashCode(), // this will help increase distribution.
            ImmutableArrayEqualityComparer<byte>.Default.GetHashCode(RawData));

        // to increase distribution XOR MSB with LSB before reserving LSB for intern flag.
        _hashCode = (_hashCode & ~1) ^ (_hashCode << 31);
    }

    ~Message()
    {
        // if interned, clear flag and release this reference.
        if((Interlocked.CompareExchange(ref _hashCode, _hashCode & ~1, _hashCode | 1) & 1) == 1)
        {
            MessageFactory.Release(this);
        }
    }

    public Message Intern()
    {
        var x = MessageFactory.Intern(this);
        if (x == this) _hashCode |= 1; // set intern flag
        return x;
    }

    // ... equality comparers

    public override int GetHashCode()
    {
        // copy MSB to LSB because LSB is reserved for intern flag.
        return unchecked((_hashCode & ~1) | (int)((uint)_hashCode >> 31));
    }
}
MkazemAkhgary commented 5 years ago

Ok. I managed to get this working with this class

private sealed class X : ConcurrentDictionary<T, T>
{
    public bool resurrect = true;

    public X(IEqualityComparer<T> e) : base(e)
    {
    }

    ~X()
    {
        if(resurrect) GC.ReRegisterForFinalize(this);
    }
}

And with GCHandleType.WeakTrackResurrection.

Before i release the handle, i will set resurrect to false.

How ever 1 byte for grouping isn't good enough for my usecase because there will be maximum of 256 collections and there are about 10 thousand interned objects. I guess grouping by 10 bit would be good.

MkazemAkhgary commented 5 years ago

With object resurrection i can really have one WeakReference<Collection<T>> collection and it does exactly what I wanted. although not by the same meaning, here items of the collection have weak reference, but the collection is forced to be alive. assuming that true is passed to the WeakReference constructor.

The only problem with this is the collection goes in LOH very quickly and GC wont collect it anytime soon. it wont collect the containing items either until GC starts to collect generation 2.

Splitting the pool into smaller pools will indeed help to solve this problem and get rid of items more rapidly. but it will complicate the process very much.

glenn-slayden commented 5 years ago

I also tried GCHandle.Alloc with GCHandleType.Weak and target collection gets cleared even without calling Free.

This is indeed the (poorly documented) behavior of GCHandleType.Weak. The main problem with rolling your own weak reference approach with GCHandle is that there's no obvious opportunity for you to free unwanted handles themselves, especially absent any way to be notified when a respective Target gets zeroed out. The latter happens elsewhere, actually, so not a single bit changes in the 32- or 64-bit value-type GCHandle type you're looking at. Perversely enough, it follows that unless your code wants to periodically visit all the handles you've scattered about "round-robin style" to check up on them, you won't discover a vacuous handle until... the exact moment that you actually need to once again restore its usefulness.

So again, yes, don't let the zeroing fool you, even when they are created in GCHandleType.Weak mode, every GCHandle should indeed have Free() explicitly called on it--if you're able to figure out how to programmatically decide that it won't become relevant or useful again before some certain period of time, that is.

The upside is that you can re-use a zeroed entry, by setting some new Object reference into its Target, thus recycling an existing handle for some savings perhaps.

But back to the downside, however, there was not enough thought put into the concurrency requirements for conveniently deploying this re-use scenario. Basically there's no (squeaky-clean, universally lauded and sanctioned) way to either (a.) atomically swap the entire GCHandle using Interlocked.CompareExchange, or (b.) atomically swap just the Target on an existing GCHandle, in-situ. The race that's implied by the lack of help here is quite a wet-blanket on what should be a very simple lock-free recycling pattern, because in the end it becomes simpler to just use a multi-phase race of first zeroing-out any orphaned GCHandle instance you encounter, and then immediately race again to install an update. Boo.

I mention the above for the benefit of the esteemed rule-followers, because in fact the following trivial IL has suited all my uses perfectly enough:

.method public static GCHandle CmpXchg(GCHandle& addr, GCHandle val, GCHandle expect) aggressiveinlining
{
    ldarg addr
    ldarg val
    ldarg expect
    call native int Interlocked::CompareExchange(native int&, native int, native int)
    ret
}