Batch Plugin - Struct access corruption in batch struct systems

EcsRx / ecsrx

A reactive take on the ECS pattern for .net game developers

MIT License

520 stars 36 forks source link

Batch Plugin - Struct access corruption in batch struct systems #17

Closed grofit closed 3 years ago

grofit commented 4 years ago

For some reason when dealing with struct based batched systems the batches seem to corrupt over time, its currently unknown why but @erinloy has made a reproduction here:

https://github.com/erinloy/EcsRxStructTest

It takes a while to happen and has been observed in .net core 2 and 3, it seems to happen anywhere from 30 seconds to 3 minutes in. It is easier to see if you make it trigger on every update (60 times a second), and then after a certain period the structs start to read garbage data rather than the correct data, as shown here:

Ref 7 7348
Ref 8 7348
Str 12 1084
Str 13 140714692425167  <-- These numbers should be around 7000-8000 
Ref 9 7348
Str 14 1015
Str 15 1084
Str 16 140714692432655
Str 17 140714692432655
Str 18 1084

After a certain point it just blows up with a Fatal error. Internal CLR error. (0x80131506) which seems to indicate that memory is being corrupted at some point.

Behind the scenes structs are stored as arrays of components (i.e 1 array for each component type) and the batches effectively pins the memory for the arrays and then creates lookups for each component needed in the batch.

public Batch<T1, T2>[] Build(IReadOnlyList<IEntity> entities)
{
    var componentArray1 = ComponentDatabase.GetComponents<T1>(_componentTypeId1);
    var componentArray2 = ComponentDatabase.GetComponents<T2>(_componentTypeId2);

    var batches = new Batch<T1, T2>[entities.Count];

    fixed (T1* component1Handle = componentArray1)
    fixed (T2* component2Handle = componentArray2)
    {
        for (var i = 0; i < entities.Count; i++)
        {
            if (entities.Count != batches.Length)
            { return new Batch<T1, T2>[0]; }

            var entity = entities[i];
            var component1Allocation = entity.ComponentAllocations[_componentTypeId1];
            var component2Allocation = entity.ComponentAllocations[_componentTypeId2];
            batches[i] = new Batch<T1, T2>(entity.Id, &component1Handle[component1Allocation], &component2Handle[component2Allocation]);
        }
    }

    return batches;
}

Once this has been done until the related entities/components change in any way it will keep re-using this batch so I am not sure if it is at some point moving the memory around after the lookup has been created so its now reading incorrect memory or if its something else.

I dont have much time to look into it at the moment but if anyone else has experience with unmanaged scenarios would be great to get some advice.

Doraku commented 4 years ago

Your intuition is correct, you have no guaranty that the GC will not move around the array of components in memory, making your previously saved pointers in Batch items invalid. Using a pointer outside of its fixed scope creation leads to undefined behaviors, you are just lucky that it seemed to work for a while.

You could fix this issue by manually pinning the array without relying on the fixed scope so you keep it pinned longer.

component1Handle  = GCHandle.Alloc(componentArray1, GCHandleType.Pinned);

// to get your pointer
component1P = (T1*)component1Handle .AddrOfPinnedObject().ToPointer();

// to release the handle
if (component1Handle.IsAllocated)
{
    component1Handle.Free();
}

Maybe wrap your Batch array in a custom type to also store the handles so you can release them correctly in a dispose pattern when you need to refresh the Batchs?

Note that your component arrays won't be able to be moved around by the GC because of this, leaving you with a fragmented memory, it can be a double edge sword. Hopefully you already refresh your Batchs when the component array is resized for more entities too.

grofit commented 4 years ago

hmm ok so I have made a basic implementation using this approach but whenever I try to do the GCHandle.Alloc it just blows up saying there is non blittable data:

Unhandled Exception: System.ArgumentException: Object contains non-primitive or non-blittable data.
   at System.Runtime.InteropServices.GCHandle.InternalAlloc(Object value, GCHandleType type)
   at System.Runtime.InteropServices.GCHandle.Alloc(Object value, GCHandleType type)
   at EcsRx.Plugins.Batching.Builders.BatchBuilder`2.Build(IReadOnlyList`1 entities) in E:\Code\open-source\ecsrx\ecsrx\src\Ecs
Rx.Plugins.Batching\Builders\BatchBuilder.cs:line 34
   at EcsRx.Examples.ExampleApps.Playground.StructBased.Struct4Application.SetupEntities() in E:\Code\open-source\ecsrx\ecsrx\s
rc\EcsRx.Examples\ExampleApps\Playground\StructBased\Struct4Application.cs:line 20
   at EcsRx.Examples.ExampleApps.Playground.BasicLoopApplication.ApplicationStarted() in E:\Code\open-source\ecsrx\ecsrx\src\Ec
sRx.Examples\ExampleApps\Playground\BasicLoopApplication.cs:line 45
   at EcsRx.Infrastructure.EcsRxApplication.StartApplication() in E:\Code\open-source\ecsrx\ecsrx\src\EcsRx.Infrastructure\EcsR
xApplication.cs:line 43
   at EcsRx.Examples.Program.Main(String[] args) in E:\Code\open-source\ecsrx\ecsrx\src\EcsRx.Examples\Program.cs:line 40

Process finished with exit code -532,462,766.

The object being used is from the example app:

[StructLayout(LayoutKind.Sequential)]
    public struct StructComponent : IComponent
    {
        public Vector3 Position { get; set; }
        public float Something { get; set; }
    }

I assume its the System.Numerics.Vector3 as float should be blittable, but if I look into the Vector3 source its just 3 floats too, so not sure if there is something else going awry.

Any advice @Doraku ? I have pushed up a branch which shows the latest changes for this in use-pinning-for-structs incase you want to check over what I have done.

Doraku commented 4 years ago

That's really strange, it seems to pin just fine the first type StructComponent but not the second one StructComponent2... If you remove the bool property it works O_o... I will try to look more into it later today.

Doraku commented 4 years ago

So apparently, you can't pin an array of struct containing either a bool or a char because they are not blittable, despite an array of bool or char directly being pinnable... Why would that be, I have no clue, especially when unmanaged is supposed to be just that, blittable (I get that they don't have the same representation in managed/unmanaged interop context but we don't care about that here). You can make char works by setting CharSet = CharSet.Unicode in the StructLayoutAttribute but the bool is more problematic. You could use a private byte to act as the backing field of a bool property but this limitation seems so stupid :/ a bool has much more usability for a game and this workaround is a pain to do.

A stupid idea that come to mind is just to say screw you to the memory management and use a byte[] as storage (sizeof(T) size when you want to make it grow) for your unmanaged components and cast the pointer as T to do your read/write operations. Since you are already pinning the array in the batch builder, might as well pin it in ComponentDataBase level and nothing stops you of pinning a byte array (and casting its content as something else entirely) but I really worry for what the memory will look like after a while...

A safer approach could be to, instead of storing the GCHandles in the pinned batch, store the arrays directly and instead of the pointers in the Batch item store the indexes of the components. Size wise it would be the same (everything is just an address), performance wise you would suffer the bound check of the arrays but at least the GC could do its job and you would have no unsafe code. I think it is worth a benchmark.

grofit commented 4 years ago

The worry here is that if we are unable to make it act like one big chunk of contiguous memory being accessed we wont get the benefits of pre-fetch and cpu cache, so if thats the case you will be suffering all the negative aspects of structs without gaining any of the performance benefits.

I struggled to actually profile how much it was hitting the cache and prefetching using the existing approach but given it was faster than the class based ones I assumed it was doing ok in terms of hits etc. So ideally I think we need to find a way to keep it as a large blob of sequential memory its just how we do it, if we end up having to just store indexes which at runtime have to be looked up we lose most of the benefit of even using structs :(

(As an idea further down the line was to potentially look at allowing people to use SIMD style interactions which would require all the data be available this way).

Doraku commented 4 years ago

Although you would have to jump in memory because of indices fetching, the accesses should be predictable to the cpu and you should see some gains. If you want to use pointers to the end I see no other choice than to use a byte[] as storage for your unmanaged types internally (just because of bool and char >_>). I managed to go almost to the end but some concession had to be made (IComponentPool<> lost its out modifier, because I changed Components to a Span, but the extension AddComponents(this IEntity entity, params IComponent[] components) does not work anymore because of this, added reference to System.Memory and unsafe compilation to EcsRx). Not sure how you feel about that, and I didn't want to change too much ^^" I can send you a PR if you want to take a look.

While all the examples do not run this is what I get on my machine for the batch ones (you should setup a project with BenchmarkDotNet for performance measurement btw):

Class4Application - Uses auto batching to allow the components to be clustered better in memory
Class4Application - Setting up 200000 entities in 1159ms
Class4Application - Simulating 100 updates - Processing 200000 entities in 603ms
Struct4Application - Uses auto batching to group components for cached lookups and quicker reads/writes
Struct4Application - Setting up 200000 entities in 1217ms
Struct4Application - Simulating 100 updates - Processing 200000 entities in 239ms

Struct4BApplication - Uses auto batching to group components mixed with multithreading
Struct4BApplication - Setting up 200000 entities in 1477ms
Struct4BApplication - Simulating 100 updates - Processing 200000 entities in 182ms

Full disclosure, I have my own ecs framework but I enjoy looking around at others implementation to see what good features I could add, so I hope you don't mind me lurking here ^^"

grofit commented 4 years ago

I did used to have a suite of benchmarkdotnet tests in at some point, I cannot remember why they were removed, I think it was because we needed more visibility of certain timings and the runner kept blowing up depending on core/net builds, I ended up moving to use dotTrace and dotMemory for profiling and the examples you posted above were made that way to quickly run via that.

I am kinda hoping that this gets legs https://github.com/dotnet/csharplang/issues/1147

As I remember way back trying to see if I could use ref structs but its missing the bit asked for in there, as that would solve all issues and also keep the surface API clean and not have to do much crazy stuff under the hood.

I think it may be worth wrapping up what you have done as a separate plugin if it can be expressed that way, then the default implementations can be left as is, but you could load the plugin to override the underlying component databases etc.

The current batching thing was an experimental plugin based off some conversations with a guy working on a 2d game at the time who wanted more throughput and looking how unity gets a lot of its performance wins (outside of burst/jobs). So I am a bit hesitant to change to much of the surface API or underlying architecture (unless it benefits everything in a good way) just to sort this issue, but it is a big issue as it does render the struct batching unusable :(

By all means lurk around, that's all I do these days, I have no where near as much time to dedicate to open source stuff as I used to.

Doraku commented 4 years ago

Oh I know this issue, I found it too when I was on a quest to make a components pre-fetcher for my own framework >_> never found an alternative with a sizeable gain and stopped working on this feature for now. Then I saw your issue with an interesting approach, you can see my changes #18. In my framework the classes handling components are internal so I can be a little exotic if I need to but since yours exposes all your api with the possibility to inject user custom implementations I get why you would not want to complicate things too much.

grofit commented 4 years ago

In most cases I still use class based components, and the batching for them works fine and is still very performant so in most cases if people are wanting to get the fastest possible speed out of this framework they would probably end up scrapping it or replacing chunks of the innards with their own bits (hence the plugin approach so you can replace anything with your own implementations).

The slowest part of the entire system is group resolving, that takes a hefty chunk of time when you have lots of entities in same collections with varying groups (can be mitigated to some extent) so although struct based batching is effectively broken, its only a relatively minor subset of people who would possibly use it, especially given unity ECS exists now, so most people would probably opt for that as its so much faster, just architecturally more of a pain :(

Doraku commented 4 years ago

I am probably a little crazy but where would be the fun in just using Unity's ECS :p?

grofit commented 3 years ago

Right after much deliberation I have decided to fix this issue for now by just pinning the data which means anyone who uses structs and batching needs to make sure all data in the components is blittable:

https://docs.microsoft.com/en-us/dotnet/framework/interop/blittable-and-non-blittable-types

While I really dislike having to do this, right now there is no other alternative and struct batching seems to be a decent performance win for some use cases (especially if it is read only and can be multithreaded too), so while I may revist this at a later date as there was discussion of ref struct refs which may allow you to do what we need without pinning, right now there seems to be no alternative.

Just to clarify if you are NOT USING THE BATCHING PLUGIN your structs can be whatever you want, the blittable requirement is only to allow them to be used in the batching plugin.

As a quick performance check on it all here is the readouts from the example apps (by no means a real benchmark):

Class Batching - No Multithreading

Class4Application - Uses auto batching to allow the components to be clustered better in memory
Class4Application - Setting up 200000 entities in 1378ms
Class4Application - Simulating 100 updates - Processing 200000 entities in 1316ms

Struct Batching - No Multithreading

Struct4Application - Uses auto batching to group components for cached lookups and quicker reads/writes
Struct4Application - Setting up 200000 entities in 1260ms
Struct4Application - Simulating 100 updates - Processing 200000 entities in 438ms

Struct Batching - Multithreading

Struct4BApplication - Uses auto batching to group components mixed with multithreading
Struct4BApplication - Setting up 200000 entities in 1271ms
Struct4BApplication - Simulating 100 updates - Processing 200000 entities in 96ms

So while im not entirely happy with this, it at least makes things usable and performance is still reasonable.

Will push up the change later.