MikePopoloski / SharpBgfx

C# bindings for the bgfx graphics library
MIT License
169 stars 32 forks source link

General memory management concerns #22

Closed Alan-FGR closed 6 years ago

Alan-FGR commented 6 years ago

Hello there :smile_cat:. I was taking a look at the code and I really like it, it looks top quality! Great work! However, I noticed something a bit weird here: https://github.com/MikePopoloski/SharpBgfx/blob/4fba84e527b43a7d6f240b3df5c6be3504e7e86d/SharpBgfx/MemoryBlock.cs#L66-L75 Can't you use fixed instead of pinning the managed array there? Unless I'm understanding that incorrectly, that method doesn't use the Release callback, thus there's no need to pin the managed object there, you could simply fix it what I believe is much faster.

Another point is regarding the MakeRef method, in which managed memory is pinned: https://github.com/MikePopoloski/SharpBgfx/blob/4fba84e527b43a7d6f240b3df5c6be3504e7e86d/SharpBgfx/MemoryBlock.cs#L86-L92 and then released on callback from bgfx: https://github.com/MikePopoloski/SharpBgfx/blob/4fba84e527b43a7d6f240b3df5c6be3504e7e86d/SharpBgfx/MemoryBlock.cs#L185-L189 That's potentially a lot of managed memory that's going to be pinned for a while, hindering overall GC performance (possibly very considerably). I don't have any numbers to show you, but I'm pretty confident that in this situation it would be much better to allocate memory in the unmanaged heap and copy it there instead of pinning the managed memory. That can be achieved several ways, including with the C# stdlib (marshal.allochglobal or something, and marshal.copy or buffer.memcopy), or maybe even using NativeMethods.bgfx_copy(); (Which I assume allocs and copies using bgfx native and returns ptr to memory. Btw, have you compared this copier to the one in C# Buffer namespace?).

In any case, pinning managed memory should be avoided imho, especially of potentially large objects like an array.

Please take this as constructive and lightheartedly, I can't stress enough that overall SharpBgfx looks really impressive!

raizam commented 6 years ago

Using unmanaged memory is not as easy as it looks. First it requires all the types to be blittable, which basically means only C primitives are allowed, and you cannot store managed class, only blittable structs. Then these structs can only be referenced using pointers, which would turn the public API itself into unsafe.

In C# you just can't bypass the GC in a reasonable way

Alan-FGR commented 6 years ago

@raizam I don't understand what you're saying... I'm talking about avoiding having objects in the managed heap and avoiding pinning them because that slows down GC. I'm talking about pointers here. In C# it's perfectly possible to allocate memory in the unmanaged heap either by using Marshal or just Pinvoking some allocator (like std malloc). Using unmanaged memory in unsafe C# is just as easy as in C, of course I assume the stuff for that in the built-in std lib (Marshal) has some overhead because of checks and stuff, and there's always the interop overhead anyway, so for many small temp objects I don't recommend going unsafe, but for long-lived objects you can perfectly well allocate unmanaged memory and pass the pointer to C++. The public API already is unsafe, but not because of that, you can internally malloc and hold the ptr in a managed class, and it doesn't have to be exposed to the user, just as the MemoryBuffer class already does btw. In that first snippet there's no reason to pin the GC handle (think of perf impact on mt for example) just to immediately release it, for that it's better to use a fixed block. When you need to unpin that memory later on then you have to make a handle for it of course, in that case it would make sense to have the handle and pin the memory, but as I said in most cases that's not a good idea, it's probably better for performance if you really need to convert from a managed array, just to copy the data to unmanaged memory in a fixed block, and again never pinning the managed object. Better yet would be if for stuff like vertex buffers we could have something like a List that internally uses an unamanaged memory block, much akin to the original List but not using a managed array. That way we could have the memory ready to pass the pointer to the native side, but unfortunately you can't cast pointers to generic types in C# afaik :cry:.

MikePopoloski commented 6 years ago

For your first question: a fixed statement also pins the memory. Of course it must, otherwise the memory might move while we're calling into unmanaged code. The fixed statement is a bit faster to execute than manually invoking a GCHandle method call, but the underlying operation in the GC / JIT is essentially the same. You've already noted though that you can't use pointers to generic types, so fixed cannot be used in this case.

For your second question, yes, the memory must be pinned if we want native code to have a reference to it. If you do not need native code to have a reference, then feel free to call some other method that makes a copy instead. There are other methods for making copies, MakeRef is for making references.

Alan-FGR commented 6 years ago

@MikePopoloski regarding the first question, why do you need pointers in that case, can't you just copy the entire memory block using the size of the managed collection with something like (untested kinda pseudo):

//where myManagedCollection is IList

//get size of the managed struct and calc entire memory block size
var elementSize = Marshal.SizeOf<myManagedCollection.GetElementType()>();
var memBlockSize = myManagedCollection.Count * elementSize;

IntPtr unmanagedBlock = SomeAllocator(memBlockSize); //allocs unmanaged
var managedRef = __makeref(myManagedCollection); //makes ref to managed collection

fixed (void* managedPtr = &managedRef) //fixes managed collection ref
{
    Buffer.MemCopy(managedPtr, unmanagedBlock, memBlockSize);
}

on the second question, you can of course currently use it like this:

var bgfxMemPtr = Bgfx.allocMem(size);
fixed (MyStruct* ptr = &managedMemBlock)
{
    Buffer.MemCopy(ptr, bgfxMemPtr, size);
}

but I was thinking that maybe something like that could be provided in the API. Maybe it could be default but that of course would require some more thought and some tests.

In any case, thanks for the reply and keep up the good work, you're doing an amazing job here! :+1:

raizam commented 6 years ago

@Alan-FGR While allocating unmanaged memory is easy using Marshal.AllocHGlobal, you cannot store managed objects there, only blittable structs. So you cannot store a .NET array in there.

Now the fact the MemoryBlock class is mostly used to store binary resources such as vertices, images... these could live in the unmanaged heap in this particular use case. This would require a change in the API to remove the use of T[], and custom iteration/write methods

Alan-FGR commented 6 years ago

@raizam yes, I'm aware that you can't store ref types in the C# unmanaged heap, but we don't use those objects in this case anyway. As you pointed out there's no problem storing value type "arrays". Yes it would require changes but they could be transparent to the user. The point here is that dealing with anything GC is terrible for performance, especially on mobiles, I don't know if it's just because the hardware is weak or the implementation is slow too. And since these days RAM isn't as scarce we could trade some more RAM usage (for copying managed to unmanaged - twice the mem) for more framerate.

Alan-FGR commented 6 years ago

@MikePopoloski actually apparently fixing a ref like that doesn't work, but there gotta be a way to fix it, and you just need the void* there.

raizam commented 6 years ago

Problem with keeping the current API, is that acessing the data will require a copy operation to the managed side, just like in the code you posted, hence consuming the GC, and more memory.

Anyway, any unmanaged heap will be referenced by some managed instance, so in the end, the GC will still track a managed instance, but it does avoid pinning a large memory block. Not sure it's worth it.

Alan-FGR commented 6 years ago

Yeah, I'm not 100% sure it's worth it too. Btw, apparently they did implement an 'unmanaged' constraint recently that allows you to get ptrs to generic: https://github.com/dotnet/roslyn/issues/27474 I don't know what version exactly has that. Any ideas where I can find that info?

MikePopoloski commented 6 years ago

There has been a bunch of work lately in .NET Core / C# to enable better support for unmanaged memory; things like stackalloc of Span types and things like that. You should already be able to make use of that by calling the MemoryBlock constructor that takes a pointer and a length; that will make a single copy into bgfx memory and you won't need any managed allocations tracking it.

Alan-FGR commented 6 years ago

@MikePopoloski yes, I did adventure myself a bit into the wild lands of C# 7.3, I managed to make a class to replace my vertices accumulator, I'm replacing it with one that uses a T[] internally, here's how it looks so far:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Runtime.InteropServices;

public unsafe class NativeMemMan<T> : ICollection<T> where T : unmanaged 
{
    private readonly float overflowMult_;
    private readonly int elementSize_;
    private int dataSizeInElements_;
    private T* data_;

    public int Count { get; private set; } = 0;

    public bool IsReadOnly => false;
    public int DataSizeInBytes => dataSizeInElements_*elementSize_;
    public IntPtr Data => (IntPtr)data_;

    public NativeMemMan(int startBufferSize = 16, float overflowMult = 1.25f)
    {
        if((int)(startBufferSize*overflowMult) < startBufferSize+1)
            throw new ArgumentOutOfRangeException("Overflow multiplier doesn't increment");
        overflowMult_ = overflowMult;
        elementSize_ = Marshal.SizeOf<T>();
        dataSizeInElements_ = startBufferSize;
        data_ = (T*)Marshal.AllocHGlobal(DataSizeInBytes);
    }

    ~NativeMemMan()
    {
        Marshal.FreeHGlobal((IntPtr)data_);
    }

    public void Add(T item)
    {
        if (Count+1 >= dataSizeInElements_)
        {
            var newElementCount = (int)(dataSizeInElements_*overflowMult_);
            var newDataSize = elementSize_*newElementCount;
            var newData = (T*)Marshal.AllocHGlobal(newDataSize);
            Buffer.MemoryCopy(data_, newData, DataSizeInBytes, DataSizeInBytes);
            Marshal.FreeHGlobal((IntPtr)data_);
            dataSizeInElements_ = newElementCount;
            data_ = newData;
        }
        data_[Count] = item;
        Count++;
    }

    public void Clear()
    {
        Count = 0;
    }

    public T GetUnsafe(int index)
    {
        return data_[index];
    }

    public IEnumerator<T> GetEnumerator()
    {
        for (int i = 0; i < Count; i++)
        {
            yield return GetUnsafe(i);
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    public bool Remove(T item)
    {
        throw new NotImplementedException();
    }

    public bool Contains(T item)
    {
        throw new NotImplementedException();
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        throw new NotImplementedException();
    }
}

It does work wonders, but unfortunately the gains in speed were very marginal although there were gains in insertion too (also very marginal though). To use in the memory block I'm doing this:

MemoryBlock.MakeRef(lastLines.Data,lastLines.DataSizeInBytes,IntPtr.Zero,(userData)=>{ });

The Pinvoke call it uses is bgfx_make_ref_release while it could be a simple make_ref, but I doubt that is the culprit.

It's pretty cool that they implemented that feature, but there are some complaints nevertheless.

In any case, thank you @MikePopoloski for your support.

Alan-FGR commented 6 years ago

btw, if you guys are interested in some numbers, here are some results (first number is insertion):

Managed T[]
time: 00:00:00.0317812, frame: 0.03848301
time: 00:00:00.0317703, frame: 0.04052956
time: 00:00:00.0320193, frame: 0.04377967
time: 00:00:00.0322605, frame: 0.03911525
time: 00:00:00.0318247, frame: 0.03805126
time: 00:00:00.0314744, frame: 0.03787199

Unmanaged
time: 00:00:00.0314162, frame: 0.03132832
time: 00:00:00.0311709, frame: 0.03147236
time: 00:00:00.0320741, frame: 0.03126743
time: 00:00:00.0312701, frame: 0.03137964
time: 00:00:00.0308288, frame: 0.03155378
time: 00:00:00.0317833, frame: 0.03142719

Those are numbers of a debug drawer where I insert shapes to be drawn as lines. As you can see perf increase wasn't that bad... first tests were worse because it was a Debug build. I didn't avg/plot them, but you can see frame time went from ~39 to ~31 what's a 25% increase, not too bad imho.

Alan-FGR commented 6 years ago

Not to mention it's much more stable too... old way frames ranged from 38 to 43