Closed grofit closed 3 years ago
Your intuition is correct, you have no guaranty that the GC will not move around the array of components in memory, making your previously saved pointers in Batch items invalid. Using a pointer outside of its fixed scope creation leads to undefined behaviors, you are just lucky that it seemed to work for a while.
You could fix this issue by manually pinning the array without relying on the fixed
scope so you keep it pinned longer.
component1Handle = GCHandle.Alloc(componentArray1, GCHandleType.Pinned);
// to get your pointer
component1P = (T1*)component1Handle .AddrOfPinnedObject().ToPointer();
// to release the handle
if (component1Handle.IsAllocated)
{
component1Handle.Free();
}
Maybe wrap your Batch array in a custom type to also store the handles so you can release them correctly in a dispose pattern when you need to refresh the Batchs?
Note that your component arrays won't be able to be moved around by the GC because of this, leaving you with a fragmented memory, it can be a double edge sword. Hopefully you already refresh your Batchs when the component array is resized for more entities too.
hmm ok so I have made a basic implementation using this approach but whenever I try to do the GCHandle.Alloc
it just blows up saying there is non blittable data:
Unhandled Exception: System.ArgumentException: Object contains non-primitive or non-blittable data.
at System.Runtime.InteropServices.GCHandle.InternalAlloc(Object value, GCHandleType type)
at System.Runtime.InteropServices.GCHandle.Alloc(Object value, GCHandleType type)
at EcsRx.Plugins.Batching.Builders.BatchBuilder`2.Build(IReadOnlyList`1 entities) in E:\Code\open-source\ecsrx\ecsrx\src\Ecs
Rx.Plugins.Batching\Builders\BatchBuilder.cs:line 34
at EcsRx.Examples.ExampleApps.Playground.StructBased.Struct4Application.SetupEntities() in E:\Code\open-source\ecsrx\ecsrx\s
rc\EcsRx.Examples\ExampleApps\Playground\StructBased\Struct4Application.cs:line 20
at EcsRx.Examples.ExampleApps.Playground.BasicLoopApplication.ApplicationStarted() in E:\Code\open-source\ecsrx\ecsrx\src\Ec
sRx.Examples\ExampleApps\Playground\BasicLoopApplication.cs:line 45
at EcsRx.Infrastructure.EcsRxApplication.StartApplication() in E:\Code\open-source\ecsrx\ecsrx\src\EcsRx.Infrastructure\EcsR
xApplication.cs:line 43
at EcsRx.Examples.Program.Main(String[] args) in E:\Code\open-source\ecsrx\ecsrx\src\EcsRx.Examples\Program.cs:line 40
Process finished with exit code -532,462,766.
The object being used is from the example app:
[StructLayout(LayoutKind.Sequential)]
public struct StructComponent : IComponent
{
public Vector3 Position { get; set; }
public float Something { get; set; }
}
I assume its the System.Numerics.Vector3
as float should be blittable, but if I look into the Vector3 source its just 3 floats too, so not sure if there is something else going awry.
Any advice @Doraku ? I have pushed up a branch which shows the latest changes for this in use-pinning-for-structs
incase you want to check over what I have done.
That's really strange, it seems to pin just fine the first type StructComponent but not the second one StructComponent2... If you remove the bool property it works O_o... I will try to look more into it later today.
So apparently, you can't pin an array of struct containing either a bool or a char because they are not blittable, despite an array of bool or char directly being pinnable... Why would that be, I have no clue, especially when unmanaged
is supposed to be just that, blittable (I get that they don't have the same representation in managed/unmanaged interop context but we don't care about that here).
You can make char works by setting CharSet = CharSet.Unicode
in the StructLayoutAttribute
but the bool is more problematic. You could use a private byte to act as the backing field of a bool property but this limitation seems so stupid :/ a bool has much more usability for a game and this workaround is a pain to do.
A stupid idea that come to mind is just to say screw you to the memory management and use a byte[]
as storage (sizeof(T) size when you want to make it grow) for your unmanaged components and cast the pointer as T to do your read/write operations. Since you are already pinning the array in the batch builder, might as well pin it in ComponentDataBase level and nothing stops you of pinning a byte array (and casting its content as something else entirely) but I really worry for what the memory will look like after a while...
A safer approach could be to, instead of storing the GCHandles in the pinned batch, store the arrays directly and instead of the pointers in the Batch item store the indexes of the components. Size wise it would be the same (everything is just an address), performance wise you would suffer the bound check of the arrays but at least the GC could do its job and you would have no unsafe code. I think it is worth a benchmark.
The worry here is that if we are unable to make it act like one big chunk of contiguous memory being accessed we wont get the benefits of pre-fetch and cpu cache, so if thats the case you will be suffering all the negative aspects of structs without gaining any of the performance benefits.
I struggled to actually profile how much it was hitting the cache and prefetching using the existing approach but given it was faster than the class based ones I assumed it was doing ok in terms of hits etc. So ideally I think we need to find a way to keep it as a large blob of sequential memory its just how we do it, if we end up having to just store indexes which at runtime have to be looked up we lose most of the benefit of even using structs :(
(As an idea further down the line was to potentially look at allowing people to use SIMD style interactions which would require all the data be available this way).
Although you would have to jump in memory because of indices fetching, the accesses should be predictable to the cpu and you should see some gains.
If you want to use pointers to the end I see no other choice than to use a byte[]
as storage for your unmanaged types internally (just because of bool
and char
>_>). I managed to go almost to the end but some concession had to be made (IComponentPool<>
lost its out
modifier, because I changed Components to a SpanAddComponents(this IEntity entity, params IComponent[] components)
does not work anymore because of this, added reference to System.Memory and unsafe compilation to EcsRx). Not sure how you feel about that, and I didn't want to change too much ^^" I can send you a PR if you want to take a look.
While all the examples do not run this is what I get on my machine for the batch ones (you should setup a project with BenchmarkDotNet for performance measurement btw):
Class4Application - Uses auto batching to allow the components to be clustered better in memory
Class4Application - Setting up 200000 entities in 1159ms
Class4Application - Simulating 100 updates - Processing 200000 entities in 603ms
Struct4Application - Uses auto batching to group components for cached lookups and quicker reads/writes
Struct4Application - Setting up 200000 entities in 1217ms
Struct4Application - Simulating 100 updates - Processing 200000 entities in 239ms
Struct4BApplication - Uses auto batching to group components mixed with multithreading
Struct4BApplication - Setting up 200000 entities in 1477ms
Struct4BApplication - Simulating 100 updates - Processing 200000 entities in 182ms
Full disclosure, I have my own ecs framework but I enjoy looking around at others implementation to see what good features I could add, so I hope you don't mind me lurking here ^^"
I did used to have a suite of benchmarkdotnet tests in at some point, I cannot remember why they were removed, I think it was because we needed more visibility of certain timings and the runner kept blowing up depending on core/net builds, I ended up moving to use dotTrace and dotMemory for profiling and the examples you posted above were made that way to quickly run via that.
I am kinda hoping that this gets legs https://github.com/dotnet/csharplang/issues/1147
As I remember way back trying to see if I could use ref structs but its missing the bit asked for in there, as that would solve all issues and also keep the surface API clean and not have to do much crazy stuff under the hood.
I think it may be worth wrapping up what you have done as a separate plugin if it can be expressed that way, then the default implementations can be left as is, but you could load the plugin to override the underlying component databases etc.
The current batching thing was an experimental plugin based off some conversations with a guy working on a 2d game at the time who wanted more throughput and looking how unity gets a lot of its performance wins (outside of burst/jobs). So I am a bit hesitant to change to much of the surface API or underlying architecture (unless it benefits everything in a good way) just to sort this issue, but it is a big issue as it does render the struct batching unusable :(
By all means lurk around, that's all I do these days, I have no where near as much time to dedicate to open source stuff as I used to.
Oh I know this issue, I found it too when I was on a quest to make a components pre-fetcher for my own framework >_> never found an alternative with a sizeable gain and stopped working on this feature for now. Then I saw your issue with an interesting approach, you can see my changes #18. In my framework the classes handling components are internal so I can be a little exotic if I need to but since yours exposes all your api with the possibility to inject user custom implementations I get why you would not want to complicate things too much.
In most cases I still use class based components, and the batching for them works fine and is still very performant so in most cases if people are wanting to get the fastest possible speed out of this framework they would probably end up scrapping it or replacing chunks of the innards with their own bits (hence the plugin approach so you can replace anything with your own implementations).
The slowest part of the entire system is group resolving, that takes a hefty chunk of time when you have lots of entities in same collections with varying groups (can be mitigated to some extent) so although struct based batching is effectively broken, its only a relatively minor subset of people who would possibly use it, especially given unity ECS exists now, so most people would probably opt for that as its so much faster, just architecturally more of a pain :(
I am probably a little crazy but where would be the fun in just using Unity's ECS :p?
Right after much deliberation I have decided to fix this issue for now by just pinning the data which means anyone who uses structs and batching needs to make sure all data in the components is blittable:
https://docs.microsoft.com/en-us/dotnet/framework/interop/blittable-and-non-blittable-types
While I really dislike having to do this, right now there is no other alternative and struct batching seems to be a decent performance win for some use cases (especially if it is read only and can be multithreaded too), so while I may revist this at a later date as there was discussion of ref struct refs which may allow you to do what we need without pinning, right now there seems to be no alternative.
Just to clarify if you are NOT USING THE BATCHING PLUGIN your structs can be whatever you want, the blittable requirement is only to allow them to be used in the batching plugin.
As a quick performance check on it all here is the readouts from the example apps (by no means a real benchmark):
Class4Application - Uses auto batching to allow the components to be clustered better in memory
Class4Application - Setting up 200000 entities in 1378ms
Class4Application - Simulating 100 updates - Processing 200000 entities in 1316ms
Struct4Application - Uses auto batching to group components for cached lookups and quicker reads/writes
Struct4Application - Setting up 200000 entities in 1260ms
Struct4Application - Simulating 100 updates - Processing 200000 entities in 438ms
Struct4BApplication - Uses auto batching to group components mixed with multithreading
Struct4BApplication - Setting up 200000 entities in 1271ms
Struct4BApplication - Simulating 100 updates - Processing 200000 entities in 96ms
So while im not entirely happy with this, it at least makes things usable and performance is still reasonable.
Will push up the change later.
For some reason when dealing with struct based batched systems the batches seem to corrupt over time, its currently unknown why but @erinloy has made a reproduction here:
https://github.com/erinloy/EcsRxStructTest
It takes a while to happen and has been observed in .net core 2 and 3, it seems to happen anywhere from 30 seconds to 3 minutes in. It is easier to see if you make it trigger on every update (60 times a second), and then after a certain period the structs start to read garbage data rather than the correct data, as shown here:
After a certain point it just blows up with a
Fatal error. Internal CLR error. (0x80131506)
which seems to indicate that memory is being corrupted at some point.Behind the scenes structs are stored as arrays of components (i.e 1 array for each component type) and the batches effectively pins the memory for the arrays and then creates lookups for each component needed in the batch.
Once this has been done until the related entities/components change in any way it will keep re-using this batch so I am not sure if it is at some point moving the memory around after the lookup has been created so its now reading incorrect memory or if its something else.
I dont have much time to look into it at the moment but if anyone else has experience with unmanaged scenarios would be great to get some advice.