TurningWheel / Barony

Barony Open Source Release
http://www.baronygame.com/
Other
489 stars 128 forks source link

Steamworks causing choppy performance #395

Open solenum opened 6 years ago

solenum commented 6 years ago

I've been discussing an issue about performance with @WALLOFJUSTICE on the steam discussion boards, and I think I've finally made a breakthrough as to the cause (Credit to Aozaki Aoko on steam for suggesting it).

Playing the game via direct connect results in perfectly normal smooth gameplay, zero issues even with the host client having artificially stunted performance and latency.

Playing the game via steamworks results in choppy and varying performance, when I was playing with a friend who was on Windows, and I was on Linux the issue was slight, not bad enough to be game-breaking but bad enough to be annoying.

However, when I play with my friend on a less-powerful Macbook Pro via Steamworks (who also lives further away, thus more latency), the issues go far beyond just annoying and render the game unplayable. These issues are not present when we play via direct connect.

One last thing, my friend that was playing on Windows said he didn't suffer the same performance issues even when we where playing via Steamworks, however he does get the performance issues on Steamworks when playing on Linux. So it's looking like it might be specific to Unix platforms.

So, it seems to be specific to steamworks, and gets worse when there are clients connected that are either further away or (less likely) on worse hardware. Perhaps some blocking networking/listening for packets?

Edit; Because I always forget to note this, the framerate that steam displays appears to be very high in both situations, always sitting above 100fps regardless of how choppy it feels.

WALLOFJUSTICE commented 6 years ago

Before I look into any of the netcode, in game.cpp, line 2499 or so theres a SteamAPI_RunCallbacks(); which runs every game tick. Could you report back any changes if you limited this callback function? I know when I was playing around with some of the Steam API I would get a lot of junk responses about requesting too much from an API endpoint like leaderboards or achievements.

if ( ticks % 10 == 0 ) { SteamAPI_RunCallbacks(); } // runs every 10 ticks or 0.2 secs}

Will look into what keepalive packets we have floating around.

solenum commented 6 years ago

Is compiling with steamworks something I can do? If so where do I go about getting the libraries for that? (and are there any risks with replacing the binary of the steam-installed Barony as that's how I'll need to test the game if I'm using steamworks).

Edit; Access to the Steam SDK (and thus the steamworks api libraries) seems to be limited to users with developer accounts?

WALLOFJUSTICE commented 6 years ago

https://partner.steamgames.com/doc/sdk

Should be able to grab v142 SDK from there without special access?

Also no risk replacing binaries, can be run from anywhere as well.

solenum commented 6 years ago

I'm getting a link error, perhaps this version of the SDK is to old?

[ 16%] Linking CXX executable barony
CMakeFiles/barony.dir/src/game.cpp.o: In function `main':
game.cpp:(.text.startup+0x3cc): undefined reference to `g_SteamLeaderboards'
game.cpp:(.text.startup+0x3d6): undefined reference to `CSteamLeaderboards::ProcessLeaderboardUpload()'
game.cpp:(.text.startup+0x3db): undefined reference to `SteamAPI_RunCallbacks'
collect2: error: ld returned 1 exit status
CMakeFiles/barony.dir/build.make:3149: recipe for target 'barony' failed
make[2]: *** [barony] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/barony.dir/all' failed
make[1]: *** [CMakeFiles/barony.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
WALLOFJUSTICE commented 6 years ago

Have you re-run cmake to configure for steamworks? Should display BUILDING WITH STEAMWORKS or something.

export STEAMWORKS_ROOT="../dependencies"
export STEAMWORKS_ENABLED=1

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./ -G "Unix Makefiles"
solenum commented 6 years ago

Alright, got it compiling.

So I tried the changes you suggested, and its really hard to notice much of a difference, it sort of feels like there's a hiccup in performance every second or so instead of being virtually constant like before. I tried the modified version and then the steam version immediately after and I do feel like the issue on the steam version was slightly worse.

Due to the actual framerates not really changing much it's certainly hard to tell, the difference between the two versions could just be placebo.

Edit; Hold on, something strange is going on. It seems the binary is failing to launch and instead steam is running the steam version. I'll replace the steam binary entirely and try again.

Double edit; Yeah it's not better with the changes :/

WALLOFJUSTICE commented 6 years ago

Cool, I guess next steps is to play with the packet handling in void clientHandleMessages() from net.cpp line 3390 and see what's being parsed...

while (packet = net_handler->getGamePacket())
        {
            memcpy(net_packet->data, packet->data(), packet->len());
            net_packet->len = packet->len();

            clientHandlePacket(); //Uses net_packet.

            delete packet;
        }

There's a loop here which might be causing a lot of wait time depending on what's going through.

WALLOFJUSTICE commented 6 years ago

Also later on in game.cpp there this

// frame rate limiter
            while ( frameRateLimit(fpsLimit) )
            {
                if ( !intro )
                {
                    // handle network messages
                    if ( multiplayer == CLIENT )
                    {
                        clientHandleMessages();
                    }
                    else if ( multiplayer == SERVER )
                    {
                        serverHandleMessages();
                    }
                }
            }

If there's a lot of packets to process during the frame rate limit it could cause the stutter while the frame rate is evaluating and maybe skip some intervals

solenum commented 6 years ago

So, I changed the while loop to be a simple if statement, that way it only handles a single packet per tick/iteration/whatever. The issue was still present though, so I don't suspect its down to that chunk of code.

There also appears to be nothing specific to steamworks inside of the clientHandlePacket() function so I would imagine that's not the source of the issue either. Can't rule it out entirely though as I'm unfamiliar with the codebase.

WALLOFJUSTICE commented 6 years ago

Correct the clientHandlePacket() just processes the data if it came from steam or direct IP. Anywhere in the code if you see !directconnect that means it's a Steam hosted game rather than direct IP

solenum commented 6 years ago

Alright, I removed the internal framelimit entirely and the game ran at about 300fps+, the issue was still present and if anything, worse. So that rules that out lol.

I just ran singleplayer without the framelimit as well, and it actually ran at about 600fps+, more than double what I was getting playing with one other person.

WALLOFJUSTICE commented 6 years ago

Fun times debugging :) I can only think of the actual multithreading net packet code then which is exclusive to steam lobbies as somewhere else to look.. that uses SDL multithreading as this was before the codebase was ported to C++ so no std::thread. No idea what impact that could have if any.

I'll look into it more this week.

And just for posterity can you list specs of all known machines that are encountering this performance decrease?

Also how are you verifying the above is this just singleplayer or did you have a buddy?

solenum commented 6 years ago

I have a few friends I keep bugging to join my game to test with me, so all of the cases where the issue is present I am playing with another friend online, who are located in Finland and Romania (performance seems worse with the one from Romania, who is geographically further away from me than the one in Finland, so higher latency..).

As for specs, these are mine: i7-7700k GTX 1070 8GB 16GB DDR4

The guy from Finland: i5-(4th gen?) GTX 770 8GB DDR3

The guy from Romania: (weakest rig, worst latency, worst performance caused when playing with) Macbook pro 2017 i5-something iGPU 8GB DDR4

Now that I have the game compiling with steamworks, I'll spend some time myself debugging it over the next few days. I'll fire up my own macbook pro and use it as a second client so I can debug this faster, currently getting someone to join my game after every code change is a bit of a bottleneck in the process.

To summarize:

WALLOFJUSTICE commented 6 years ago

Thanks for your time to look at this, let us know how it goes!

solenum commented 6 years ago

So, I have some new findings to report.

Playing via steamworks certainly tanks performance, today I'm seeing frame drops down to 30 and below, I think before the choppy performance was frame drops but just so fast that the fps display on screen wasn't catching it as it doesn't update that fast. I guess the steam servers are being a bit slow this morning and making the issue worse?

Whats more weird, if I play steamworks and then go back to singleplayer, the choppy performance and frame drops continue until I restart the game entirely. But if I start the game and go straight to singleplayer or direct connect (thus never play via steamworks) the performance is fine.

Whats even more weird, I tried to record this with simplescreenrecorder to show you just whats going on, and the second I hit record the terrible performance drops vanish and the game starts running well! I'm going to continue doing some digging but something really strange is going on here.

As an additional note, how you do the frame limit is pretty bad, SDL_GetTicks() is inaccurate and you shouldn't be using sleep for the rest of the frame time.

See here for a high resolution timer with SDL2

Fix your timestep :)

solenum commented 6 years ago

Success!

I've managed to find the exact source of the performance issues, and you where correct it is related to multi-threading.

This function is the cause of the performance drops.

net.cpp:4424
int steamPacketThread(void* data)

As for the reason recording a video results in smoother gameplay, I suspect its down to some crazy timing coincidence. Hell of a bug..

Still not sure why playing singleplayer after playing via steamplay results in continued degraded performance though. Perhaps some flag isn't being reset so it continues running the steamworks code even after you end the game and switch to singleplayer.

WALLOFJUSTICE commented 6 years ago

Hmm looking through menu.cpp I don't think closeNetworkInterfaces() gets called at all if you Save & Disconnect... That's the only point where the multithreading gets joined and quits. Is that how you were quitting?

solenum commented 6 years ago

That is how I was quitting to the menu, yes. Also removing the threading from the steamworks section fixes the performance issues entirely it seems.

WALLOFJUSTICE commented 6 years ago

Alrighty we're onto something here!

You mean just running steamPacketThread() every tick instead of handling it in a thread improves performance?

solenum commented 6 years ago

Just running steamPacketThread each tick breaks the networking entirely, I'm not sure how to run it without the multithreading, looks like it would take some work to strip out and when I do it, it just ceases to work.

I just know that steamPacketThread() is causing performance issues, do you have a copy of these methods that don't rely on the multi threading to work so I can test it?

solenum commented 6 years ago

Never mind, I got multi-threading stripped and performance is great, it's certainly down to that.

Just running steamPacketThread() each tick without any of the multi-threading fixes the performance issue entirely. Here's the diff if you want to try for yourself.

diff --git a/src/net.cpp b/src/net.cpp
index cdbf6eb..3ac11ee 100644
--- a/src/net.cpp
+++ b/src/net.cpp
@@ -3393,8 +3393,8 @@ void clientHandleMessages()
    if (!directConnect && !net_handler)
    {
        net_handler = new NetHandler();
-       net_handler->initializeMultithreadedPacketHandling();
-   }
+  }
+       steamPacketThread(static_cast<void* >(net_handler));
 #endif

    if (!directConnect)
@@ -4144,8 +4144,9 @@ void serverHandleMessages()
    if (!directConnect && !net_handler)
    {
        net_handler = new NetHandler();
-       net_handler->initializeMultithreadedPacketHandling();
-   }
+  }
+    steamPacketThread(static_cast<void* >(net_handler));
 #endif

    if (!directConnect)
@@ -4411,13 +4412,13 @@ void NetHandler::addGamePacket(SteamPacketWrapper* packet)
 SteamPacketWrapper* NetHandler::getGamePacket()
 {
    SteamPacketWrapper* packet = nullptr;
-   SDL_LockMutex(game_packets_lock);
+   // SDL_LockMutex(game_packets_lock);
    if (!game_packets.empty())
    {
        packet = game_packets.front();
        game_packets.pop();
    }
-   SDL_UnlockMutex(game_packets_lock);
+   // SDL_UnlockMutex(game_packets_lock);
    return packet;
 }

@@ -4480,9 +4481,10 @@ int steamPacketThread(void* data)
            handler.addGamePacket(packet);
        }

-       SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
-       run = handler.getContinueMultithreadingSteamPackets();
-       SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);
+    run = false;
+       // SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
+       // run = handler.getContinueMultithreadingSteamPackets();
+       // SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);
    }

 #endif
WALLOFJUSTICE commented 6 years ago

Looks good, out of curiousity how's the performance if you leave the multithreading in, but comment out all the mutex's? (Which is basically what you've done but not in the main thread.)

I can't really see where the locking of the thread is necessary when running since it modifies it's own local data only. stopMultithreadedPacketHandling() could still keep a mutex on the destructor.

solenum commented 6 years ago

If I comment out all mutex's the performance is still terrible, strange.

However, if in addition to commenting out all mutex's I set run to false at the bottom of steamPacketThread() instead of handler.getContinueMultithreadingSteamPackets(), it seems much better.

WALLOFJUSTICE commented 6 years ago

Ok, and last sanity check what happens if we use a std::thread instead of SDL_Thread* steam_packet_thread?

steam_packet_thread = SDL_CreateThread(steamPacketThread, "steamPacketThread", static_cast<void* >(this)); should be steam_packet_thread = std::thread(steam_packet_thread, static_cast<void*>this); or something SDL_WaitThread(steam_packet_thread, NULL); becomes currentThread.join();

solenum commented 6 years ago

I don't suppose you have the changes to make it functional with std::thread? I've made the changes but I guess I've missed something because its just crashing.

Edit; Lol nevermind, I got that working. Performance is just as bad with std::thread.

WALLOFJUSTICE commented 6 years ago

https://github.com/WALLOFJUSTICE/Barony/commit/640f06372e5ac580778b5407d1272fd5f8f81f22

Quick and dirty changeover. Give that a go, loads fine in solo steamworks lobby can't verify it any further.

solenum commented 6 years ago

Alright, I compiled and ran that and it still has terrible performance, I would even say the performance was worse (dropped down to 10fps at points) but the performance is so varied that it might just be exactly the same.

WALLOFJUSTICE commented 6 years ago

Oh you got it working, just saw the edit. I was hopeful for a second there! Have we explored all avenues then?

I guess in this case if there's nothing left I can provide an options menu to disable multithreading of network packets. It was necessary at one point in time to as they weren't originally multithreaded and the actual network performance could suffer compared to direct IP due to the amount of small packets Steam networking produced.

I guess you're free to distribute to your friends in the meantime and let us know if performance is OK during extended game sessions. Might cook up a patch this weekend.

solenum commented 6 years ago

Awesome! That seems like a good fix, I'll see about sending it to some friends and giving it a longer more intensive test over the next few days and I'll report any extra findings I have here. It's certainly a really strange issue that seems to be unique to Unix-based systems.

Should I also start an additional issue thread on the framerate limit and the fact that it uses sleep/usleep and the inaccurate SDL_GetTicks for timekeeping? There are much better ways to handle the games timestep, which should result in better performance/consistency and fix the awful input latency induced with vsync enabled.

WALLOFJUSTICE commented 6 years ago

Yeah please open up an issue. I locally tested using getPerformanceCounter(), getPerformanceFrequency and removing the sleep() and it hit 60fps as a target pretty well instead of minor overshooting/undershooting.

Haven't had a chance to try it on 144hz yet so I can't fully appreciate the difference :)

solenum commented 6 years ago

I've sent the binary to my friend who is also on Linux and it resolves the performance issue for him as well, and we've had some extended runs and its fine :)

I'm investigating the actual specific cause of the issue at the moment because it's bugging me, and I've found the exact culprit but why its causing an issue and how its behaving is some dark magic bullshit.

So, here's the exact culprit causing the performance issue.

net.cpp:4483
SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
run = handler.getContinueMultithreadingSteamPackets();
SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);

I decided to time everything to figure out where the issue was coming from, and the act of timing the above 3 lines of code actually resolves the performance issue.....

std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
run = handler.getContinueMultithreadingSteamPackets();
SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);

std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
std::cout << duration << "\n";

To further confirm that the source of the issue is 100% the 3 lines above, changing run to false also resolves the performance issue.

SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
run = false; //handler.getContinueMultithreadingSteamPackets();
SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);

The plot thickens...

solenum commented 6 years ago

A workaround is to simply call delay right before it. I guess some really strange timing stuff is going on, I suspect the performance benefit of multi-threading here is almost none even on Windows to be honest, probably better off without it entirely?

The fix/workaround...

SDL_Delay(10);
SDL_LockMutex(handler.continue_multithreading_steam_packets_lock);
run = handler.getContinueMultithreadingSteamPackets(); 
SDL_UnlockMutex(handler.continue_multithreading_steam_packets_lock);
WALLOFJUSTICE commented 6 years ago

Committed in branch here with some other junk https://github.com/WALLOFJUSTICE/Barony/commit/2d71af01cc07cf562ba9c3c9f8366f2eb8d3edf7

Option toggle is in the 'misc' tab at the bottom left, can you run in a steam lobby and test if toggling that in-game correctly fixes the stutter?

I could only confirm tests that network messages still continued when i used the toggle.

WALLOFJUSTICE commented 6 years ago

Also there's new entries in the Lang en.txt file so copy that over to see the new option text

solenum commented 6 years ago

I've compiled and run that branch, and the process starts but there's no game window for some reason. Log file showing no issues either.

The last part of the log before it completely stops doing anything:

[09-47-44] [PhysFS]: successfully initialized, returned: 0
[09-47-44] [PhysFS]: successfully mounted output /home/exezin/.barony folder
[09-47-44] [PhysFS]: successfully set write folder /home/exezin/.barony/mods
WALLOFJUSTICE commented 6 years ago

Huh that's strange, I got it running on Ubuntu just now.. failing there implies Steam had some issues, was Steam open or needs a restart?

solenum commented 6 years ago

Huh, restarting steam did the trick. That was odd.

It seems this fix works wonders, with multi-threading on I'm dropping to about 20fps or below today, with it off I'm getting a steady 140fps+ :)

WALLOFJUSTICE commented 6 years ago

Awesome to hear :) let me know if toggling it mid-game fixes the issue without having to reboot Barony.

Also note to self to closeNetworkInterfaces() when saving + quitting multiplayer lobby.

This also includes the SDL timing fixes so it's a mix of that as well.

solenum commented 6 years ago

It does! I can toggle it mid-game and the difference is like day and night. :+1:

Edit; I sent the binary to my friend, it's the same for him. With multi-threading today he's dropping to 30fps and below just like me (must be down to the steam servers and how they're performing, causing the long waits/blocking with threads etc), but with multi-threading disabled, performance is fantastic.

addictgamer commented 6 years ago

For anybody using Intel CPUs, can you guys give disabling hyperthreading a shot?

WALLOFJUSTICE commented 6 years ago

workaround is now live in v3.2.2