ValveSoftware / GameNetworkingSockets

Reliable & unreliable messages over UDP. Robust message fragmentation & reassembly. P2P networking / NAT traversal. Encryption.
BSD 3-Clause "New" or "Revised" License
8.22k stars 619 forks source link

Early Retry Mechanism for SteamNetworkingSockets Authentication #301

Open nicopaes opened 1 year ago

nicopaes commented 1 year ago

Hey GameNetworkingSockets maintainers!

As a developer working with the Steam API through the Steamwork.NET wrapper in Unity, I am reaching out to bring an issue to your attention and seek your insights on how to enhance the authentication process in our integration.

Technical Info: Unity Version -> LTS 2021.3.17f1 Steamworks.NET Version -> 20.2.0

Issue Description: Our users have been encountering authentication issues when trying to connect to the Steam backend.

Currently, we employ the SteamNetworkingSockets.InitAuthentication() function to initiate and, if necessary, retry the authentication process. The challenge we face is that, according to the documentation, this function allows unlimited retries, but only after the previous attempt has failed entirely.

The crux of the matter is the substantial delay before the authentication process is deemed a failure, resulting in unnecessary waiting periods. We're trying to figure out a method for introducing early retry attempts without the need to wait for a complete failure – much like a timeout mechanism.

We kindly request your expertise and advice on this matter. Are there any suggestions, ideas, or potential solutions you could recommend to help us implement retries without having to wait for the current process to fail entirely?

zpostfacto commented 1 year ago

Do you know which step is taking a long time and stalling the retry?

nicopaes commented 1 year ago

Do you know which step is taking a long time and stalling the retry?

Yes. Using the _SteamNetworkingSockets.GetAuthenticationStatus(out netAuthenticationStatusT) to check the current Status we receive back the enum with the value _k_ESteamNetworkingAvailabilityRetrying.

When the AuthenticationStatus is in this state calling the InitAuthentication() doesn't restart the process. We've to wait until the Status returns _ESteamNetworkingAvailability.k_ESteamNetworkingAvailabilityFailed. Sometimes this takes 5 to 10 seconds depending on the user.

zpostfacto commented 1 year ago

Can you tell which step it is waiting on? Maybe the contents of SteamNetAuthenticationStatus_t::m_debugMsg will say what it's doing?

nicopaes commented 1 year ago

I've the m_debugMsg printed when the status changes via callback.

It usually goes:

-> k_ESteamNetworkingAvailability_Retrying :::: Attempt #X to fetch config from https://api.steampowered.com/ISteamApps/GetSDRConfig/v1?appid=X

-> k_ESteamNetworkingAvailability_Failed :::: No response from server

zpostfacto commented 1 year ago

I would really like to debug this. How often is it happening? We designed that endpoint to have extremely high availability. It's actually served by Akamai and we set all sorts of aggressive http caching headers so that Akamai will serve state data if Steam is down, etc.

I think the answer to your immediate question is that there isn't much more we can do to "kick" the API. You can listen for a callback when the authentication status and immediately retry if it fails. You can just sit in a loop and constantly ask it to initialize until you get back success status. But if one step is just stalling, it's just not working and we cannot really retry again before the previous attempt fails. (I don't want to change the code so that there can be more than one request in flight at a time.)

One thing I can investigate is adjusting that WebAPI fetch to use a shorter timeout. It really should be nearly instant. But I think if we're waiting on that fetch, there isn't much more we can do if that isn't working.

Also - the very first fetch might fail legitimately, that is normal, since we use the only-if-cached header and so it will only check the local cache. That might fail immediately, and that's expected and normal. The idea here is that we have cached data we apply it immediately, and then we immediately issue a real request to check for an up-to-date version.

If you have any tools at your disposal to help me understand why that API fetch is failing, I would really appreciate it. It does seem to be failing more than I would expect given the significant measures we have take to make it highly available. (I am looking into the same basic problem in CSGO.) Where are you in the world, when you do a DNS lookup on api.steampowered.com, what Akamai edge hosts will serve the request, have you noticed any patterns that cause it, etc?

nicopaes commented 1 year ago

How often is it happening?

According to our tests, this is happening 90% of the time. The game we're working on has a pretty massive Chinese following, so players from this region are experiencing this issue, unlike the rest of the world.

Where are you in the world, when you do a DNS lookup on api.steampowered.com, what Akamai edge hosts will serve the request, have you noticed any patterns that cause it, etc?

We are using a function from a library (Heathen Steamworks) to determine the host's country from their IP (it is highly likely that they use a Steam API call internally). We can see that the players experiencing this issue are from China, Hong Kong, Taiwan, and Singapore without using VPNs.

With the force retry method, we are achieving an 85% success rate for the authentication to occur within a 10-minute window. Based on the logs, it takes an average of 9 calls to the "GetSDRConfig" endpoint. This is a good success rate, but 10 min average to connect is too much to ask to players, so an early-retry mechanism would help us expedite the times for players.

We can conduct some additional testing and try to log more information as you requested to provide more context for you. For now, I thought it would be important to share this additional information.

zpostfacto commented 1 year ago

Got it, that makes sense. I have made some improvements for China specifically. I've shipped them in CSGO and will try to get them into the full Steam client ASAP.

nicopaes commented 1 year ago

Got it, that makes sense. I have made some improvements for China specifically. I've shipped them in CSGO and will try to get them into the full Steam client ASAP.

That's great to hear, looking forward to it! Can you give us a heads up when it goes live so we can run more tests?

bravarda commented 12 months ago

Hey there, any news on this?