DarklightGames / DarkestHour

Darkest Hour: Europe '44-'45
https://store.steampowered.com/app/1280/Darkest_Hour_Europe_4445/
67 stars 20 forks source link

Artillery locking up & can't be used #589

Closed JaKHamer closed 7 years ago

JaKHamer commented 8 years ago

Soupy and I have been experiencing issues related to arty. On bridgehead one issue is the voice over does not match the nationality. However the main issue, which seems to be getting worse is the arty locking up and forcing a restart of the server to become functional. It is as if the arty has not stopped from the previous strike even though it has. Therefore it won't allow another arty call. We have found no way to free it up in game. Ask Matt drop by sometime when he is free and we can discuss it further. Thanks guys.

AndrewTheel commented 8 years ago

Thanks for reporting the issue, assigning to Matt and Colin to debug. Eventually we will be rewriting the entire artillery system, but in the meantime we should debug the current.

AndrewTheel commented 7 years ago

Did we fix the artillery locking up bug? I think we finally found the cause and fixed it, unless it was a dream. @cmbasnett

handsm commented 7 years ago

For info, I spoke with Soupy after this was reported and again recently. I haven't had the chance to ask JaKHamer about it (or really I didn't remember to ask him the times I've spoken with him!). But Soupy said the problem happened over a period of about a week, about once a day, but hasn't been seen since. This was about the time the Good Guys server had some technical problems that required a hardware replacement by the provider, which may or may not be relevant. This was on the current 7.0.2 release.

It was not map specific and affected maps with both the new and old style spawning systems.

Although a server re-start is mentioned in the report, I don't think that was required to clear the problem and a map re-start would do it.

Soupy has also mentioned other occasional arty problems. New target being marked with binocs and showing updated on officer's map, then arty falling at an old location. That was at the time the arty lock-ups were happening and I don't think it is current. Also Soupy recently mentioned that when a carried radio is used some yards away from a static radio (e.g. the ones on tables or whatever in some maps), sometimes the HQ acknowledgement voice message can be heard coming from the static radio as well as the radioman. That suggests a too large radius on the static radio, so both end up being used at the same time.

AndrewTheel commented 7 years ago

We need to make it so that isn't possible at all. Otherwise there may be some exploits involving calling arty with a radioman in a static radio radius.

handsm commented 7 years ago

@Soupy provided some extra information that has given me a good idea of why this bug is happening and a plausible possibility of how the situation can arise. Soupy noticed that when the arty 'jams' on a map and can't be used any more, the bright white arty strike icon remains stuck on the overhead map. I mean the marker for a 'live' arty strike, not the similar, pale icon just showing where an arty officer has marked some co-ordinates.

The arty jams because the game thinks an arty strike is still happening, and it won't let that team call another strike until it's over - but this situation lasts for the rest of the map. A live arty strike is signified by having a recorded ArtyStrikeLocation for the team, in the GRI actor. A null value means no live strike. So the strike icon remaining on the map is a symptom of the ArtyStrikeLocation failing to get cleared (zeroed) at the end of a strike, causing the team's arty to jam.

I first suspected this would be problem of the arty spawner actor not being able to spawn at the specified location, e.g. the height was under the terrain or inside world geometry or whatever. But when I looked into it, the only place the ArtyStrikeLocation gets set is when the arty spawner spawns, in its PostBeginPlay() event. And it gets cleared by the arty spawner when that actor is destroyed. So even if spawning it goes wrong, it either won't spawn (so no PostBeginPlay and no location set) or will immediately be destroyed (so location gets cleared). So doesn't appear to be that.

The arty spawner lasts for as long as the arty strike, then it destroys itself. That is the critical thing, as it must destroy itself in order for the ArtyStrikeLocation to get cleared, which is essential. So I think the problem is something goes wrong during the strike, resulting in the arty spawner not destroying itself, leaving the strike forever live.

During its time, the arty spawner runs a series of timers, starting with the initial lead in period, then a delay between each shell in a salvo, then a delay until the next salvo, and so on. These timers are randomised within a range. Each time it keeps setting a new timer until the next shell/salvo, until the last shell of the last salvo is spawned, when the arty spawner destroys itself, triggering the vital clearing of the ArtyStrikeLocation.

Very occasionally it seems this sequence of timers somehow fails, breaking the chain and preventing the arty spawner from destroying itself. I don't see any flaws in the logic, i.e. at each stage there seems no chance of the code failing to either set a new timer or destroy the arty spawner. But I suspect the problem may be that when a randomised timer gets set, it could very occasionally be randomised to a nil timer delay, and setting a nil timer actually cancels any existing timer and does not set a new one. So if a nil delay is inadvertently selected, it will break the timer sequence and result in the team's arty being jammed for the rest of the round.

I think the culprit is perhaps the timer that is set between each shell being spawned. It is randomised to between 0 and 1.5 seconds, like this: SetTimer(FRand() * 1.5, false);

I guess that very occasionally the FRand() function, which produces a random fraction number between 0 and 1, will throw up a zero. Which means zero delay until the next arty shell, which would be ok, but unfortunately it would result in setting a nil timer and jamming the arty. The chance must be very, very low each time, but it happens each shell and there has been a major increase in arty use on the Good Guys server, so it's possible the bug could appear rarely. Soupy said it can be a few weeks between incidents.

If this is the cause, it has been a latent bug since RO. It shows itself now because since DH v6, a team cannot call an arty strike until the previous arty strike is over, deigned to reduce arty spam. In RO if this bug happened, the arty spawner would fail to destroy itself and the arty strike would show as remaining live, but the RO functionality allowed overlapping strikes, so the team could call another one even though the previous one was jammed live. The new arty spawner (for the new strike) would override the ArtyStrikeLocation and would clear it when the new strike ended, so resetting everything to how it should be.

handsm commented 7 years ago

Possible fix in commit https://github.com/DarklightGames/DarkestHour/commit/e44322944e83bf483bc08414a971f21f119c6232.

Made it so that it's impossible for the arty spawner actor to set a nil timer. This could fix the problem, but I'll add an extra belt n' braces fallback solution.

handsm commented 7 years ago

Added an extra fallback safeguard in commit https://github.com/DarklightGames/DarkestHour/commit/1a0c2a89d769c6545c0bb47eb5dd6582476635cb.

When the arty spawner spawns, it now sets a LifeSpan so that after that no. of seconds the actor is automatically destroyed by the engine. This as a fail-safe in case the sequence of timers somehow gets interrupted and we don't ever get to end of the arty strike. The spawner's LifeSpan is set to the maximum possible length of the strike, assuming the max random time between shells and salvoes. Typically this is 2 or 3 minutes. If a bug somehow occurs and the arty is jammed, this fallback fix will destroy the spawner about a minute after the jam occurs, which clears the jam.

Will close this issue now as even if the 1st fix doesn't work, the fallback will work and is an adequate workaround. I made an arty debug mutator that is running on the Good Guys server, so if this problem ever crops up again I can get some debug info on it. But I think the problem should be gone now.