craftworkgames / MonoGame.Extended

Extensions to make MonoGame more awesome
http://www.monogameextended.net/
Other
1.44k stars 324 forks source link

Swtich TiledTile from class to struct to raise performance #132

Closed kreynes closed 8 years ago

kreynes commented 8 years ago

There was a discussion held in the MonoGame.Extended gitter regarding the topic of switching TiledTile from a class to a struct as it may, or may not increase the performance, as, generally, it fits the criteria of a struct. I've attached a 500x500 tiles map, each tile is 16x16 pixels. Without any culling, this map has brought my CPU usage up to 96%, and it could be a good subject to testing whether it'll actually be efficient enough to be noticeable, and worth the effort of converting to a struct.

MSDN / Choosing between a class and a struct if anyone is interested on the subject.

test_map.zip

The map is pretty plain and only contains 2 layers with some terrain and a small area of trees. If there would be a need to compile the same sized map, but with more layers and objects (closer to a real-deal map that would be used in a game) I'd gladly do so.

lithiumtoast commented 8 years ago

It's just worth mentioning that the "struct vs class" topic is slightly different when developing games instead of business applications. For games, it is OK (not great) to make structs mutable for performance reasons if the design merits it. Example: if Matrix was immutable and you want to change a few values in your Matrix, you would need to copy the whole Matrix (copying all the values), edit the values, then copy the modified Matrix back to where you got it from to have the values persist. This is pretty bad cause a Matrix is a 4x4 of floats, 64 bytes and is effectively being copied around twice to change at least one value (4 bytes). For this type of situation is favourably to pass a struct by reference instead of by value by using the ref keyword.

EDIT: Note that mutable structs are very dangerous to use when hashing. If the hash code of the struct changes while it is in a hash container such as a Dictionary<TKey, TValue> there will be problems when trying to retrieve the mutated struct; it will report back saying it is not in the Dictionary<TKey, TValue> anymore because the hash code is different. My preferred way to deal with this is to prevent hashing a mutable struct out right by throwing a Exception in GetHashCode override.

craftworkgames commented 8 years ago

@kreynes I just tried downloading your map and it doesn't contain the texture. Can you please provide the texture or include some instructions on what we need to replace it.

craftworkgames commented 8 years ago

@LithiumToast These are some excellent points. In this case I think we are okay because it's going to be immutable anyway. The main issue we could run into is that it may be boxed frequently.

kreynes commented 8 years ago

@craftworkgames My bad. Updated the link to the map and tileset.

craftworkgames commented 8 years ago

Okay guys. I finally had some time to sit down and really think about this. Let's start by looking at the Microsoft guidelines.

CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects. AVOID defining a struct unless the type has all of the following characteristics:

  • It logically represents a single value, similar to primitive types (int, double, etc.).
  • It has an instance size under 16 bytes.
  • It is immutable.
  • It will not have to be boxed frequently. In all other cases, you should define your types as classes.

The class currently looks like this (with constructor and irrelevant code removed):

public class TiledTile
{
    public int Id { get; }
    public int X { get; }
    public int Y { get; }
}

It's clear this is definitely a case to become a struct. It seems to fit all the recommended criteria.

At first glance I thought boxing might be an issue, because currently TiledTileLayer.GetTile can return null. My initial reaction was to change it to return a nullable type (TiledTile?) which would cause boxing. Thinking about it a little more I think it would be better to define a TiledTile.Empty with an Id of zero. This should avoid the boxing altogether.

So I went ahead and refactored the code. Here's what I came up with:

public struct TiledTile : IEquatable<TiledTile>
{
    public TiledTile (int id, int x, int y)
    {
        Id = id;
        X = x;
        Y = y;
    }

    public static TiledTile Empty { get; } = new TiledTile(0, 0, 0);

    public int Id { get; }
    public int X { get; }
    public int Y { get; }

    public override bool Equals(object obj)
    {
        if (obj is TiledTile)
            return Equals((TiledTile) obj);

        return false;
    }

    public bool Equals(TiledTile other)
    {
        return other.Id == Id && other.X == X && other.Y == Y;
    }

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = Id;
            hashCode = (hashCode * 397) ^ X;
            hashCode = (hashCode * 397) ^ Y;
            return hashCode;
        }
    }

    public static bool operator ==(TiledTile a, TiledTile b)
    {
        return Equals(a, b);
    }

    public static bool operator !=(TiledTile a, TiledTile b)
    {
        return !(a == b);
    }

    public override string ToString()
    {
        return $"{Id} ({X}, {Y})";
    }
}

Everything appears to work fine. I would appreciate a quick code review to make sure the implementation looks sound.

I haven't run a benchmark over it yet. Frankly, bench-marking this is a real pain and I just don't have the time to do it tonight without access to the large test map.

Given that it appears to make perfect sense as a struct my gut says we should just make the change and move on.

kreynes commented 8 years ago

@craftworkgames Looks splendid. About benchmarking. We could skip that, but I'm really curious to know if the change to a struct is actually beneficial. IMHO, the test (benchmark) cases should be as followed:

craftworkgames commented 8 years ago

I just did some basic bench-marking with some surprising results.

Struct

With Culling: 33.5 fps W/out Culling: 5.5 fps

Class

With Culling: 36 fps W/out Culling: 7 fps

Both tests are in source control. The class one in the develop branch and the struct one in the tiled-struct branch if you want to run them yourself.

I tried to keep everything as close as possible to identical between the tests. I let the frame-rate stabilize before recording and maximized the window. Moved the mouse outside the window bounds, etc.

I realize this isn't a perfect benchmark just using the existing demo but initial indicators show that using a class is actually slightly better performance.

More interestingly there's still room for improvement even with culling on. I'll run a profiler over it and see if I can identify hot areas.

craftworkgames commented 8 years ago

Oh, in the first round of tests I ran them in Debug mode. I thought it would be worth running them in Release mode also.

Release Mode

Struct

With Culling: 33.6 fps W/out Culling: 9.8 fps

Class

With Culling: 37 fps W/out Culling: 10 fps

craftworkgames commented 8 years ago

Okay, so I have some good news.

image

It turns out that drawing to a render target every frame is really slow. With some minor tweaks I managed to get the frame rate up a lot.

Now, there's still a few issues to solve before I can commit these changes.

  1. We can't get rid of the render target completely. If we do that it'll re-introduce an old bug that causes gaps between the tiles.
  2. The optimization I implemented tonight is a bit of a hack, I simply rendered the entire map to a render target once and drew that. It works, but means we have a really large texture loaded into the GPU. (Trading GPU memory for frame rate performance). This works on a PC but won't fly on a mobile device.
  3. Only rendering the visible part of the map isn't trivial either. I tried doing this by re-rendering to the render target every time the visible rectangle changes (each time the camera moves one whole tile) and unfortunately the rendering becomes very jumpy.

What we really need is a more balanced approach. Possibly rendering the map in several large chunks to keep the number of render target draws down and the total GPU memory at a reasonable level.

I'm done for tonight but I'll keep thinking about this and try get a proper solution in soon.

craftworkgames commented 8 years ago

Both tests are in source control. The class one in the develop branch and the struct one in the tiled-struct branch if you want to run them yourself.

I just realized I committed these changes to my local repo on my home computer but forgot to push them to github. Sorry about that. I'll continue to work on this tonight.

craftworkgames commented 8 years ago

I've been thinking.

The way the map is drawn is always going to be a trade off between memory and frame rate. I don't think there's ever going to be one solution that fits all situations.

I propose that we provide a way to override the DrawStrategy so that people can render the map different ways in different situations. This way we can always provide a default strategy that works pretty well in most situations but can be overridden when people have different requirements.

I suspect the way you draw small maps is going to be quite different for large maps. Also, the constraints on each platform is going to be different. On a PC with a decent graphics card it seems reasonable to simply render the entire map all at once. This gives a great frame rate at the cost of some GPU memory. On a phone, the map could be split up into smaller chunks, etc.

There's also animated tiles to consider. We haven't implemented them yet, but drawing a large static map is going to be very different from drawing one with animated tiles.

lithiumtoast commented 8 years ago

@craftworkgames

Possibly rendering the map in several large chunks to keep the number of render target draws down and the total GPU memory at a reasonable level.

Seamless maps where you enter the "chunk" size during the content pipeline phase and it just automatically splits up a map into "chunks"? As long as the "chunk" size is big enough, for 2D there should only be a maximum of 4 "chunks" be drawn at once (current "chunk" + neighbours). This would also be an effective culling technique since only the "chunk" size amount of tiles will ever be candidates for drawing. Further culling could be checking if the tile's screen position is with in the camera visible screen space.

We can't get rid of the render target completely. If we do that it'll re-introduce an old bug that causes gaps between the tiles.

I wonder if the render target technique can be avoided by locking the tile positions to the nearest integer when zooming in and out with the camera.

craftworkgames commented 8 years ago

I wonder if the render target technique can be avoided by locking the tile positions to the nearest integer when zooming in and out with the camera.

I tried this before implementing the render target approach. It's not as easy as it sounds. I don't remember the exact details but I spent a good few nights on it. I really do feel that render targets are the right approach here. Have a read of this.

That said, there may be other strategies we can use. Perhaps the PrimitiveBatch will help. We could render the tiles in a single polygon mesh to make sure the triangles are properly stitched together. The same way a 3D game renders a terrain.

I didn't get a lot of time to work on this tonight but I'm confident I can get the render target approach really fast. I'll implement a very basic draw strategy and then we can talk about implementing more sophisticated ones.

lithiumtoast commented 8 years ago

@craftworkgames

See http://stackoverflow.com/questions/2226812/xna-tearing-when-doing-2d-camera-transformations and http://gamedev.stackexchange.com/questions/25117/why-are-there-lines-in-between-my-tiles and http://gamedev.stackexchange.com/questions/25063/how-do-i-clear-up-artifacts-between-aligned-faces-when-using-aa-in-xna-4-0 One suggestion is to use sample state PointClamp or LinearClamp. I could probably look into it this weekend. I have MonoGame 3.5 installed and can't run the MG.Ex demos currently.

We could render the tiles in a single polygon mesh to make sure the triangles are properly stitched together.

PrimitiveBatch is not the best way to draw static meshes; it's meant more for dynamic geometry. Using a static VertexBuffer is probably the right way to draw tiles. If tiles are to change frequently, like say a character cutting grass in a field, perhaps an object in the object layer is more suitable.

craftworkgames commented 8 years ago

One suggestion is to use sample state PointClamp or LinearClamp. I could probably look into it this weekend.

Yep. I've already tried different clamping modes. It's not as simple as that. Besides, the sprite batch is passed into the map draw function, we don't control these parameters within the map drawing code itself (unless we are drawing to a render target).

I have MonoGame 3.5 installed and can't run the MG.Ex demos currently.

I'm hoping to upgrade everything to 3.5 in the very near future. Waiting to see if 3.5 is officially released in the next week or so.

PrimitiveBatch is not the best way to draw static meshes;

Maybe it's not the PrimitiveBatch then, but something like it. The point is that SpriteBatch might not be the right way to go here.

craftworkgames commented 8 years ago

This turned out to be a lot harder to implement than I expected so instead I've implemented the most basic thing we can do to improve frame rate performance at the cost of memory for large maps. You can see it in this commit.

This is gives us an immediate performance boost and is no worse than what we had before (There was actually a bug in the old code causing the render target to be the wrong size anyway).

I've raised a new issue so this one can be closed.

lithiumtoast commented 8 years ago

Wait, so we are sticking with using class for tiles? Be great if that performance test for comparing struct vs class could replicated on different architectures and for x86 and x64 before we ignore this completely.

EDIT: Like I could test on Mac, iOS, Windows virtual machine, Linux virtual machine, etc.

craftworkgames commented 8 years ago

Struct

With Culling: 33.6 fps W/out Culling: 9.8 fps

Class

With Culling: 37 fps W/out Culling: 10 fps

Yes, we are sticking with using a class for now. The performance tests I ran indicate that it's the better way to go. I agree this is a surprise, but we can't argue with the numbers.

Be great if that performance test for comparing struct vs class could replicated on different architectures and for x86 and x64

You're quite welcome to run more tests if you think it's worth it. I personally think our time and energy is better spent elsewhere.

before we ignore this completely.

We are not ignoring it completely. We've run some tests and made some improvements to the performance based on the evidence.

There's always more we can do of course, but I really don't think the class vs struct thing is where we are going to see the big improvements. Improving the algorithms on the other hand should be which is why I raised #136.

lithiumtoast commented 8 years ago

Alright; I'm just thinking that maybe using the ref keyboard where possible for the struct might change the outcome of the results. I'll take a look at this later.

craftworkgames commented 8 years ago

Good point.