Genbox / VelcroPhysics

High performance 2D collision detection system with realistic physics responses.
MIT License
665 stars 113 forks source link

Zero-Copy optimization #3

Open Genbox opened 7 years ago

Genbox commented 7 years ago

Currently Velcro Physics makes use of a zero-copy optimization when using the MonoGame framework. It is best illustrated with an example:

When you want to use Velcro in your MonoGame game, you have an update loop like this:

public override void Update()
{
    //Vector2 from MonoGame    
    Vector2 position = new Vector2(5, 10);

    Vec2 physicsPosition = new Vec2(position.X, position.Y);
    _world.Add(new Body(physicsPosition));
}

As you can see, we have to copy Vector2 into Vec2 to use it in the engine, and in the process, we just wasted a bit of CPU and RAM.

The optimization is that Velcro uses the Vector2 class from MonoGame internally (no dependency - source code is copied out), and then I make a MonoGame version of the library that does take dependency on MonoGame, but then I exclude our local Vector2 class from the project with a compiler constant. This way, when people are using MonoGame, we can zero copy like this:

public override void Update()
{
    //Vector2 from MonoGame    
    Vector2 position = new Vector2(5, 10);

    _world.Add(new Body(position));
}

This used to be a major optimization in Farseer Physics Engine since it used pixels as unit. However, now we use the Meter-Kilogram-Second (MKS) system instead, which means we have to copy over values anyway. This issue is to start a discussion on the relevancy of this optimization, as well as provide solutions to a better system, now that we use MKS.

Genbox commented 7 years ago

Proposals so far:

  1. Use System.Numerics namespace from .NET since it has Vector2 and Matrix (4x4) classes needed by Velcro.

    • Good 1: No external dependencies.
    • Good 2: Don't have to maintain a copy of Vector2 myself
    • Bad 1: You have to juggle with 2 types, perhaps more if you use a drawing framework that also uses its own Vector2 class.
  2. No dependency at all. Use simple types such as float and double

    • Good 1: Simple struct free APIs such as new Body(x, y)
    • Good 2: I can utilize a trick to define float as double, thereby improving the precision of the engine easily when needed. using Single = System.Double;
    • Bad 1: The code is less portable due to the use of non-standard Single instead of float
    • Bad 2: It might have an impact on performance as Vector2 is a struct and moved as a 64bit value instead of 2x 32bit values. I have yet to quantify this on Roslyn.
  3. Soft dependency on MonoGame (this is the current solution)

    • Good 1: One less type to juggle with when you use MonoGame (Velcro Vector2 == MonoGame Vector2). We don't take a hard dependency on MonoGame, so Velcro can still be used with other frameworks.
    • Bad 1: This used to provide an optimization, but it no longer does since we have to convert from pixels to MKS and then from MKS to pixels again.
  4. Use an interface for Vector2 internally in the engine such that we don't depend on any particular implementation.

    • Good 1: You as a user provide the implementation, as such, you provide one that supports SIMD or zero-copy if you like (unlikely though)
    • Good 2: Velcro can provide a Vector2 implementation for the most popular frameworks. Internally the engine would use IVector2, but all factories and ConvertUnits could have a switchable lightweight API that takes in the specific vector implementation of your favorite framework.
    • Bad 1: Interfaces has a tendency to put off new programmers as they don't understand the concept too well.
    • Bad 2: Lots of wrapper classes.
  5. Template engine to replace the implementation at compile time.

    • Good 1: No wrapper classes like in solution 4
    • Good 2: It is much like the current solution, but made generic to support more frameworks with a template engine.
    • Bad 1: Makes the project a lot more complex as the templates would have to be updated instead of the actual generated code.
    • Bad 2: Cost vs. benefit seems to high on this one.

Other things to consider:

  1. It might be possible to do the unit conversion on the GPU nowadays. I provide the ConvertUnits class to simplify MKS to Pixel conversion, but in reality, some frameworks might have a good Camera class that does the conversion using shaders.
  2. With solution 4, we limit ourselves to type-less serialization since you can't serialize interfaces. This is not necessarily a bad thing nowadays, but a thing to consider.
Genbox commented 7 years ago

The trick with float = double can also be done in solutions where Velcro provide a vector2 class.

ilexp commented 7 years ago

So far, I think the existing solution (3) is probably the best one - but if Velcro defined it's own vector and math types or used System.Numerics it probably wouldn't be that bad either. Copying a bit of data does have a performance impact, but in my experience the hot paths are not in the interfacing between physics and game, but in simulation or game itself. Once bodies are set up, there isn't really much to talk about between physics and game unless something happens - it's no longer per-frame business.

You have to juggle with 2 types, perhaps more if you use a drawing framework that also uses its own Vector2 class.

Chances are, the game or game engine that use Velcro have their own vector implementation anyway, or use one that doesn't happen to be MonoGame. That's not necessarily a bad thing - Velcro can use whatever vector math fits best for its purposes and the higher level game or game engine can do the same, as they may have different requirements and API surfaces to satisfy.

Use an interface for Vector2 internally in the engine such that we don't depend on any particular implementation.

This has potential to be a major performance sink, as that would mean virtual method calls for any IVector2 API invocation.

craftworkgames commented 7 years ago

Hey guys.

I'm the author of MonoGame.Extended and I'd just like to say I'm super excited to see this project being revived. I've been a big fan of Farseer for years. We don't currently have a physics package in our library so when people ask I always point them at Farseer (or now Velcro).

In my experience, almost all of the C# game engines I've run across in recent years sit on top of MonoGame in one way or another. So I agree that the existing solution (3) (Soft dependency on MonoGame) is the right way to go here. Even with performance issues aside, I think this provides the best user experience because copying Vector2's around can be a real pain in the backside.

That said, I can understand the hesitation to go this way. I've been there before. There are a handful of non-MonoGame based game engines around (Duality if I recall correctly) and it's certainly nice to have a dependency free version as well.

It's unfortunate that MonoGame is such a monolithic dependency when you only really need a handful of low level types. When I first started Extended they didn't have a PCL version and I had a hard time convincing them that carrying the full weight of each platform dependency was really difficult for library developers. Originally I pushed to split the low level types out of MonoGame into a separate "core" package and I think I used Farseer as a good example of how this would be useful. That never really happened though, but eventually we got the official PCL version and everything got a whole lot easier.

At some point I'd really like to integrate MonoGame.Extended with Velcro somehow. I'm not really sure what that means yet. It could be a handful of helper methods to make it easier to convert between MonoGame and Velcro types or it could be something more like an entire fork of Velcro as one of Extended's packages. Who knows.

Anyway, I'm looking forward to to seeing where this project goes and collaborating with you guys where it make sense.

Genbox commented 7 years ago

@ilexp

This has potential to be a major performance sink, as that would mean virtual method calls for any IVector2 API invocation.

In good old .NET framework days when I did performance analysis on the CLR, I found that 'callvirt' instructions were ~30x slower than their 'call' counterpart (not micro-benchmark - real world code). It also disabled some compiler optimizations since you could not inline across callvirt boundaries, as it did not know the actual implementation at compile time. It is for this reason I put solution 5 on the list.

However, now we are using Roslyn, and I've read into how it internally handle things, and it seems like their cache mechanism for looking up the interface implementation is a tad better than .NET 2.0 CLR. I'd say the overhead is probably down to 1/3 of what it used to be, so I might consider using it.

ilexp commented 7 years ago

However, now we are using Roslyn, and I've read into how it internally handle things, and it seems like their cache mechanism for looking up the interface implementation is a tad better than .NET 2.0 CLR. I'd say the overhead is probably down to 1/3 of what it used to be, so I might consider using it.

That probably depends on the runtime, and especially for a no-dependency physics library like Velcro, which might be used on consoles, mobile and who-knows-where, I wouldn't count on the runtime to be at least X efficient with Y when it was known not to be in the past.

As far as the caching mechanism for interfaces goes, do you have a source? Even though I wouldn't count on it, I would be interested in that as well :)

Genbox commented 7 years ago

@craftworkgames XNA was the same way. I think the whole .NET Core movement is the right way to go though, so I'm pushing for a .NET Core based MonoGame. .NET Standard is an improvement over PCL in the sense that tooling now fully support targeting in the foundation. Currenly, Velcro is a .NET Standard 1.4 library, but the MonoGame version (VelcroPhysics.MonoGame) only has net40 as the target, which kinda limit things.

Genbox commented 7 years ago

@ilexp You are completely right. I used to target Xbox and Windows Phone and factored optimizations on those platforms into it as well. The fact is that we are talking micro-optimizations here, and in total they yielded 20% better performance across the engine on average. However, since the compiler changed to Roslyn, I'm not even sure if is any faster now as everything was built on CLR 2.0 assumptions.

I'm sure that pooling, caching and better a compiler will yield better performance than the micro-optimizations altogether - more so on mobile platforms. Having been out of touch for quite some years, I actually have no idea on how C# is run on mobile platforms nowadays.

As far as the caching mechanism for interfaces goes, do you have a source?

Not really. It is mostly based research projects I did the past few years. The Roslyn code emitter is located here and the lookup cache implementation is somewhere in the CoreCLR project. There are several good discussions on the subject on both projects issue trackers. I love the fact MS went open source with .NET/C# a couple of years back :)

craftworkgames commented 7 years ago

I think the whole .NET Core movement is the right way to go though, so I'm pushing for a .NET Core based MonoGame. .NET Standard is an improvement over PCL in the sense that tooling now fully support targeting in the foundation.

Oh wow. Last I checked it seemed to be no chance MonoGame would ever be on .NET Standard but it looks like there's some good progress on that issue. I'd certainly be in favor of going that way if it pans out.

On the other hand, the PCL bait and switch has been working well for us so far and I think it could work quite well for you too if the .NET Standard thing doesn't work out. Either way I'm sure there's a viable solution.

Genbox commented 7 years ago

.NET Standard is just a formalization of PCL, so now we at least have a standard process for expressing framework interfaces across libraries and platforms. At least, that is the idea. As always, Microsoft will find a way to completely destroy their own efforts some way or another.

For now I'm going NET Standard with .NET Framework 4.5 and .NET Core as main platforms. It is just simple C# and math after all, so it is very portable to other platforms, but I don't want to bother with them other than to make sure we don't use any APIs that does not work on those platforms (like StopWatch used to be)

craftworkgames commented 7 years ago

That seems like a reasonable choice.

At some point I've got it on my to-do list to get Velcro working with MonoGame. If I get to it before you do I'll be sure to let you know how it goes. I have no doubt there'll be one or more solutions, the only question is how complicated it will be.

Oh, btw.. I just remembered I actually did get Farseer working as a PCL with MonoGame once before. It wasn't too difficult if I recall correctly. But anyway, there's that.

Genbox commented 7 years ago

Do you mean with .NET Standard as the target? Velcro is already targeting MonoGame, just only the .NET Framework 4.5 for now, I have not tried the PCL, but it should be straight forward.

craftworkgames commented 7 years ago

Oh right. It already works. Sorry, I should have paid more attention. I was under the impression that it couldn't be referenced from all platforms yet but I just tested it with Android and a PCL project. They both work as expected. I didn't realize .NET Standard was PCL compatible. 👍

Genbox commented 7 years ago

.NET Standard and .NET Core are very confusing technologies for most people. The thousands of questions on Stackoverflow makes that evident. When .NET Core 1.0 was first released, it had a really bad tooling, which caused further confusion since you could not reference a .NET Framework project and a .NET Core project, even if the .NET Core one was .NET Standard compliant and net45 was a target.

However, it is much better now and it is still being improved. Only a matter of time before .NET Framework is phased out and we have truly cross-platform .NET

Genbox commented 7 years ago

I have some numbers from solution 1.

Using Velcro's own Vector2 classes provided from MonoGame: old

Using System.Numerics.Vector2 instead: new

Note that System.Numerics uses hardware acceleration if it is present. As you can see, it speeds up the engine some places, but others it destroys performance. There is no 'ref' or 'out' API on Vector2 from System.Numerics, so there is a lot of memory copying.

Genbox commented 7 years ago

To confirm that we are talking about memory pressure and cache coherence, I did some micro benchmarks on both MonoGame Vector2 and System.Numerics Vector2. In theory, the System.Numerics one would be faster due to hardware accelleration (SIMD), but not with much since we are using Vector2 and not Matrix operations.

There are 3 cases:

I also did a special case for operators to see how they compare.

image

Genbox commented 7 years ago

A note regarding virtual calls: RyuJIT x64 supports devirtualization, but 32bit code still used JIT32, so many of the optimizations were left out. With .NET Core 2.0, RyuJIT will replace JIT32 as the default compiler, making optimizations available to both 32bit and 64bit.

I am yet to make performance tests for virtual calls on RyuJIT, with and without known interface implementations, just to see what kind of performance hit converting to IVector2 would be.

Genbox commented 7 years ago

A really big drawback with solution 3 is that everything pretty much have to be located inside a few projects (optimally 1). Otherwise there would be:

It is just easier to target different frameworks and make the zero-copy 'hack' with one large library.

roy-t commented 7 years ago

I know this issue is a bit older. But I just stumbled onto Velcro Physics today after searching for a nuget package for good old trusty Farseer

I'd prefer using System.Numeric or internal Vector/Matrix structs. Personally I already use different world coordinate systems for physics, drawing, and sometimes even the logical world. So there is no benefit for me for tying MonoGame to Velcro (even though I usually use MonoGame!).

I also do not see how an interface like IVector2 would work. There is no way MonoGame or any other engine could implement that without tying itself to Velcro so you will need to create your own type then anyway :).

Great to see that you're still so active in the 2D physics world. Can't wait to see an official release and Nuget package for Velcro!

Genbox commented 7 years ago

Regarding the IVector2 implementation, building several compatibility layers is the idea. It was not to escape the fact that you have to convert between different Vector2 implementations but build the engine with an abstraction of a Vector2 interface, and then have an implementation for MonoGame, System.Numerics etc. The idea might not be viable at all, but it is great we have the discussion about it.

roy-t commented 7 years ago

Ah of course. Sometimes I wish C# had an "extension interface" that you, similar to extension methods, can add an interface implementation to an existing class.