gamedig / rust-gamedig

Game Server Query Library.
https://crates.io/crates/gamedig
MIT License
38 stars 11 forks source link

[Protocol] Retry failed requests #95

Closed Douile closed 11 months ago

Douile commented 11 months ago

I added some logic to retry requests when failed, this seems to make

$ cargo run --example generic -- tf2 91.216.250.10

fail (less) often for me. (there are still sometimes malformed packet errors but I don't see WouldBlock errors anymore).

The problem with the specific query seems to be that the players and rules requests sometimes don't receive a response. Inspecting the packet captures the requests look valid so my assumption is that these responses are abnormally large for UDP packets and hence are occasionally dropped on the way to me.

For valve query protocol there might be a better way to solve this as I believe that it supports "paginated" multi-packet responses. But I think that having some retry logic could be useful as internally retrying the individual part of the request that fails might be faster than the caller retrying the entire query.

NOTE: Setting GatherOptions::{gather_player, gather_rules} to false avoids my original issue and causes query to succeed every time.

CosminPerRam commented 11 months ago

For valve query protocol there might be a better way to solve this as I believe that it supports "paginated" multi-packet responses.

Yep, and it's currently implemented, but I don't remember exactly when games do hand out these packets in multiple parts and when not (e.g. exceeding a maximum size, but some games have bigger packet sizes than others).

But I think that having some retry logic could be useful as internally retrying the individual part of the request that fails might be faster than the caller retrying the entire query.

Agreed, this could be done for every protocol that does multiple queries (I think only the Valve protocol for now).

Douile commented 11 months ago

Yep, and it's currently implemented

Ah yes I forgot its a completley server side implementation dependent thing.

Agreed, this could be done for every protocol that does multiple queries

The way I implemented here (retries field in TimeoutSettings), I think it might be confusing to users if protocols without multiple queries don't follow the same retry logic. Would probably need to document which protocols its implemented for as the multiple query part is invisible to a library consumer or just implement everywhere.

CosminPerRam commented 11 months ago

I think it might be confusing to users if protocols without multiple queries don't follow the same retry logic.

Then we could just treat this as a retry count, which not only applies to valve queries (where are multiple ones) but on every query (GS1 does just one, if it fails and we have 2 retries, try it again, and on those which do have multiple ones (valve), just retry the subqueries).

CosminPerRam commented 11 months ago

Retries also need to be done for games with proprietary queries. https://github.com/gamedig/rust-gamedig/blob/2eb1d12b3d873c8d13121d5073f8cf986bd234e2/src/protocols/types.rs#L13-L17 ... and i think this hints that retrying queries shouldn't be a top-level code in queries (duplicated stuff), but refactoring this should (?) be done in another pr.

Douile commented 11 months ago

and i think this hints that retrying queries shouldn't be a top-level code in queries (duplicated stuff), but refactoring this should (?) be done in another pr.

I agree, I just couldn't see an easy way to do it.

So I think we're kind of stuck in this layer, I guess syntactically might look nicer if added retry to the get_info like function, then query just calls through without needing retry code. But that still would involve a lot of duplicated code just in different places.

Douile commented 11 months ago

My mistake, I see what you mean now: FFOW and JC2MP call the internal query functions which do not implement retries themselves. So implementing the retries there is much preferred.

Douile commented 11 months ago

I have changed so all of the retry_on_timeouts now occur in wrappers around the method in each protocol that makes the query. This means that the proprietary games that call these functions directly don't need to implement retries themselves.

I will add some unit tests for retry_on_timeout and then assuming everyone else is happy this is ready to merge.