StalkR / Space-Engineers-Asteroid-Ores

A plugin to better control ores in asteroids
Apache License 2.0
1 stars 2 forks source link

server / client desync - unable to fully reload into world #11

Closed ShinkuShi closed 3 months ago

ShinkuShi commented 4 months ago

hello! we ran into an issue whilst using your plugin. its been working perfect for the past five weeks, though now this last week we've been having issues. on a fresh pc boot, we can get into the game server. if you leave, due to a restart or relog. it would get stuck on rejoining and not load fully in. you'll hear the character sounds and warnings + death, but your screen gets locked on the 50% loading.

The only 'fix' we found to get in, was either having a moderator remove their character or completely restarting their computer. restarting the server had no effect.

i've now had to remove the plugin due to this issue, and i was really enjoying the control of play on the server. i'm not able to add it back in for testing right now.

though if you can help give me some direction to look, i can try during our servers low player count times Keen-2024-04-24 - Server Stuck on Join.log Torch-2024-04-24 Server Stuck on Join.log

these two are for when i was testing - user Shinku_Shi in the logs for these ones.

StalkR commented 4 months ago

Thanks for the report, and details from discord:

  1. player A joins
  2. player B joins
  3. Player B leaves
  4. Player B tries to rejoin and gets stuck on loading

If exiting the game then starting it again, it sounds like something's not unloaded properly, so I'm guessing one of the events there: https://github.com/StalkR/Space-Engineers-Asteroid-Ores/blob/master/Mod/Data/Scripts/AsteroidOres/Mod.cs#L25

I'll dig.

StalkR commented 4 months ago

It could be because the mod doesn't clear its internal data (e.g. client), and it confuses the game on the next join? As a tentative fix I've added code to clear this data on unload: https://github.com/StalkR/Space-Engineers-Asteroid-Ores/commit/7834dbacf7a1801f96c7e6ca7530f9896856f42a (I'm also adding more logging).

Released an updated version of the mod if you want to try and report back. Since it requires 2 players and I only have 1 version of the game I'm afraid I can't test alone.

StalkR commented 4 months ago

I just tested on our server the steps above (https://github.com/StalkR/Space-Engineers-Asteroid-Ores/issues/11#issuecomment-2075463102), and it worked just fine: player B character was not removed, and player B was able to rejoin and not get stuck.

Unfortunately, I didn't test before the update to confirm the issue, so I can't conclude if https://github.com/StalkR/Space-Engineers-Asteroid-Ores/commit/7834dbacf7a1801f96c7e6ca7530f9896856f42a fixed it, or if our server never had that issue in the first place and it's something somehow specific to your server.

Let me know how it goes on your server (and if you can share the address I can connect and test myself).

Sardaukai commented 4 months ago

Problem is still present and we got some additional problems,

Player A connects and play - quit Player B connects and play - quits Player A wants to connect -> hang at 50% Player B wants to connect at the same time -> unable to connect, server does not realize the player

even after server restart player is not able to connect.

I am able to send you the whole log via dm if you want, is this line something from your mod?

MyVoxelReplicable.OnLoad - isUserCreated:False isFromPrefab:True contentChanged:False data?: True

Both had this entry, I am able to login and haven't that entry

There is no connecting issue without mod and plugin.

Question: Do you preload some asteroids? Got about 1500 with procedual density and 45 if density is 0

StalkR commented 4 months ago

Thanks for the report.

Problem is still present and we got some additional problems,

Player A connects and play - quit Player B connects and play - quits Player A wants to connect -> hang at 50% Player B wants to connect at the same time -> unable to connect, server does not realize the player

even after server restart player is not able to connect.

When you say quit, does it mean: a) the player disconnects and is back on the game menu (and then just clicks continue to connect again) b) or player exits game entirely (and then starts the game again then clicks continue to connect again)

After this happens, is the server still playable for players still on it? can anyone connect to the server or is the server somehow frozen/broken?


is this line something from your mod? MyVoxelReplicable.OnLoad - isUserCreated:False isFromPrefab:True contentChanged:False data?: True

No it's from the game itself, specifically from Sandbox.Game.dll inside namespace Sandbox.Game.Replication, class MyVoxelReplicable, method OnLoad:

            MyLog.Default.WriteLine(string.Concat(new string[]
            {
                "MyVoxelReplicable.OnLoad - isUserCreated:",
                flag.ToString(),
                " isFromPrefab:",
                flag2.ToString(),
                " contentChanged:",
                flag4.ToString(),
                " data?: ",
                (array != null).ToString()
            }));

Do you preload some asteroids? Got about 1500 with procedual density and 45 if density is 0

No, the plugin does not preload asteroids. When players move through space their game and the server generate asteroids as normal, and if the asteroids are left untouched (i.e. voxels not deformed, through collision or mining), then they disappear after player goes out of sync distance, and the voxel maps never stored.


Can you share what is your server, is it the same as ShinkuShi or a different one? Can we work together so I can connect to your server to reproduce the bug?

Yes please share the logs, plugin list, mod list and if possible the server save (the Torch\Instance\Saves\<save name> folder, without the Backup folder). You'll find me on discord under the same name.

Sardaukai commented 4 months ago

When you say quit, does it mean: a) the player disconnects and is back on the game menu (and then just clicks continue to connect again) b) or player exits game entirely (and then starts the game again then clicks continue to connect again)

I need to ask the player to verify but I think it is b)

After this happens, is the server still playable for players still on it? can anyone connect to the server or is the server somehow frozen/broken?

It is playable, I was able to connect normally.

Can you share what is your server, is it the same as ShinkuShi or a different one? Can we work together so I can connect to your server to reproduce the bug?

Its a different server on a dedicated Windows machine

Yes please share the logs, plugin list, mod list and if possible the server save (the Torch\Instance\Saves\<save name> folder, without the Backup folder). You'll find me on discord under the same name.

I will contact you via dm

StalkR commented 4 months ago

I was discussing this issue with other developers and server admins.

Some players on another server without my mod are having similar issues connecting, and they're able to work around it by installing PluginLoader and enabling the plugin named TimeoutFix (it's on my github, but I'm not the author, I'm just the publisher). Ask the player to try and see if it helps? let me know.

StalkR commented 4 months ago

In @Sardaukai 's server logs, when the issue happens, this is shown:

23:19:40.9685 [INFO]   DAC.AntiCheatPlugin: ValidationFailed detected from game Admin Info: ServerInvoked MyMultiplayerBase.OnScriptedChatMessageReceived ID: 7.....3 Name ?Sardaukai ADMIN-ACCESS

It comes from AntiCheat-DORI plugin by @dorimanx.

Also noteworthy: when that happens, other players can connect just fine, so server is not entirely broken. It seems somehow tied to this user. I have a theory it may be DAC somehow reacting to my plugin (sending too many msgs or too early) and 'banning' the player.

To confirm or rule out this theory, I proposed @Sardaukai runs for a couple days with:

If the connection problems disappear, that may be it. If not, then it's something else.

@ShinkuShi do you also have the AntiCheat-DORI plugin enabled? if so, do you want to also try disabling it for a couple days?

Sardaukai commented 4 months ago

Same issue and more problems, DAC disabled.

No one was able to connect, server needed a restart to solve it. We get this earlier when an affected player hang on login and does not quit.

I also realize that WeaponCore got an update. Maybe it is more related to WC?

@ShinkuShi you also use wc, or?

StalkR commented 4 months ago

Thanks for the information, so it's unlikely to be because DAC, and in fact maybe DAC is pointing us at the actual issue, something with a validation failing.

Our server run with WC as well, and we didn't notice the issue (yet), so probably not.

StalkR commented 4 months ago

I've done more testing on our server, we still can't reproduce.

Sardaukai commented 4 months ago

Maybe it is not directly related to this plugin & mod. Could be caused by another mod if this mod is in use. We should compare our settings in order to find mods/plugins you not use.

I simply start with our modlist, give me some time to make a comparable list

Also mod loading order could be a fact too! Asteroid ore is currently at #1 at our server

StalkR commented 4 months ago

here's some quick analysis

with open('StalkR/modlist.txt') as f: stalkr = {int(e.strip()) for e in f.readlines()}
with open('Sardaukai/modlist.txt') as f: sardaukai = {int(e.strip()) for e in f.readlines()}
with open('ShinkuShi/modlist.txt') as f: shinkushi = {int(e.strip()) for e in f.readlines()}
common_except_stalkr = {e for e in sardaukai.intersection(shinkushi) if e not in stalkr}
only_stalkr = {e for e in stalkr if e not in sardaukai and e not in shinkushi}

Common mods between Sardaukai and ShinkuShi, but not on StalkR:

`for e in common_except_stalkr: print(' - https://steamcommunity.com/workshop/filedetails/?id=%i' % e)` - https://steamcommunity.com/workshop/filedetails/?id=3154379105 - https://steamcommunity.com/workshop/filedetails/?id=1992435623 - https://steamcommunity.com/workshop/filedetails/?id=2966429095 - https://steamcommunity.com/workshop/filedetails/?id=525460808 - https://steamcommunity.com/workshop/filedetails/?id=2503712170 - https://steamcommunity.com/workshop/filedetails/?id=2880317963 - https://steamcommunity.com/workshop/filedetails/?id=500818376 - https://steamcommunity.com/workshop/filedetails/?id=2652227853 - https://steamcommunity.com/workshop/filedetails/?id=1736785358 - https://steamcommunity.com/workshop/filedetails/?id=2297024111 - https://steamcommunity.com/workshop/filedetails/?id=2756894170 - https://steamcommunity.com/workshop/filedetails/?id=973526550 - https://steamcommunity.com/workshop/filedetails/?id=2936367770 - https://steamcommunity.com/workshop/filedetails/?id=3017795356 - https://steamcommunity.com/workshop/filedetails/?id=1135484957

Mods only on StalkR:

`for e in only_stalkr: print(' - https://steamcommunity.com/workshop/filedetails/?id=%i' % e)` - https://steamcommunity.com/workshop/filedetails/?id=1743310853 - https://steamcommunity.com/workshop/filedetails/?id=1825460112 - https://steamcommunity.com/workshop/filedetails/?id=2831946773 - https://steamcommunity.com/workshop/filedetails/?id=2465669142 - https://steamcommunity.com/workshop/filedetails/?id=2870402223 - https://steamcommunity.com/workshop/filedetails/?id=2570427696 - https://steamcommunity.com/workshop/filedetails/?id=2206564401 - https://steamcommunity.com/workshop/filedetails/?id=2849070130 - https://steamcommunity.com/workshop/filedetails/?id=1864380341 - https://steamcommunity.com/workshop/filedetails/?id=2642594104 - https://steamcommunity.com/workshop/filedetails/?id=2349540537 - https://steamcommunity.com/workshop/filedetails/?id=2667547195 - https://steamcommunity.com/workshop/filedetails/?id=2747855549 - https://steamcommunity.com/workshop/filedetails/?id=2831924414 - https://steamcommunity.com/workshop/filedetails/?id=2340559167 - https://steamcommunity.com/workshop/filedetails/?id=2683816256 - https://steamcommunity.com/workshop/filedetails/?id=2189531971 - https://steamcommunity.com/workshop/filedetails/?id=2825871811 - https://steamcommunity.com/workshop/filedetails/?id=2831946436 - https://steamcommunity.com/workshop/filedetails/?id=2368413383 - https://steamcommunity.com/workshop/filedetails/?id=2780615880 - https://steamcommunity.com/workshop/filedetails/?id=2831947336 - https://steamcommunity.com/workshop/filedetails/?id=3046287818 - https://steamcommunity.com/workshop/filedetails/?id=2921890257 - https://steamcommunity.com/workshop/filedetails/?id=2831947604 - https://steamcommunity.com/workshop/filedetails/?id=2388326362 - https://steamcommunity.com/workshop/filedetails/?id=2831947867 - https://steamcommunity.com/workshop/filedetails/?id=2626756831 - https://steamcommunity.com/workshop/filedetails/?id=646089317 - https://steamcommunity.com/workshop/filedetails/?id=2678657382 - https://steamcommunity.com/workshop/filedetails/?id=2216602986 - https://steamcommunity.com/workshop/filedetails/?id=1976348140 - https://steamcommunity.com/workshop/filedetails/?id=801185519 - https://steamcommunity.com/workshop/filedetails/?id=1359618037 - https://steamcommunity.com/workshop/filedetails/?id=2493824757 - https://steamcommunity.com/workshop/filedetails/?id=2547246713 - https://steamcommunity.com/workshop/filedetails/?id=2443062653

Unfortunately, nothing from those lists stands out as potentially causing issues. We can try to remove one by one but that's tedious.

Mod load order doesn't matter for my mod, but might for another. You can try the two extremes (first / last) in case there's anything.

Sardaukai commented 3 months ago

New information's !!!!

A player quit server but not the game (client was running whole time) Then he was not able to connect ( 50% bug) I restarted the server -> same 50% bug! After another restart and a client restart he was able to connect.

During that time, the player seems to be "online" for the server. I saw that he also got a lootbox.

So I think there could be something with the mod which blocks loading!

Sardaukai commented 3 months ago

Pull request done, I'll do some more performance stuff this week.

At space with 1.0 density there are more than 1k voxels at a place (and needs to be processed after jumpdrive use) so a parallel.foreach is great to do the job. I made a pretest at my beta server and it works without issues, but I want to test it in order to nail all errors down.....

Why is 50% hanging? If the client receives a server message when booting up and the character doesn't exist, it won't be able to remove anything. This only happens in places where asteroids COULD form. So players on planets are not affected because when a message occurs there is nothing to delete (empty msg)

StalkR commented 3 months ago

Thanks for the investigation and the PR (#14)!

StalkR commented 3 months ago

Released new mod version on steam with the fix.

Happy to discuss the parallel.foreach optimizations in a different issue/PR when you're ready.

Sardaukai commented 3 months ago

Need to be reopened .... bug not gone....

Sardaukai commented 3 months ago

In order to give a small sum i'll post my results here.

!! I used no clientside pluginLoader to do bughunting !! (If you use it game does sometimes load, so error is not 100% reproduceable)

  1. There is no error at spaces where no asteroids and boulders are present (but could spawn)
  2. There is no error at spaces where asteroids or boulders are present and NOT drilled or "touched"
  3. There is a low chance of error when asteroids or boulder are drilled but their count is low (even one)
  4. There is a high chance of error when there are a lot (bigger than 5) roids or boulders are drilled or touched.
  5. Issue chance grow if there are big ships around.
  6. Sometimes, after a server restart, the game loads fine FOR THE FIRST TIME, then the issue occurs.
  7. Sometimes server needs a STOP instead of a restart

Why? Everytime the game runs without issues these messages does NOT occur:

MyVoxelReplicable.OnLoad - isUserCreated:False isFromPrefab:True contentChanged:False data?: True ?? COULD it be that another player produce this roid??

MyVoxelReplicable.OnLoad - isUserCreated:False isFromPrefab:True contentChanged:True data?: True This IS produced by drilling a roid.

My ideas: Could it be an time issue? I'll try parallel next Is it possible to set these roids to active? If they are present, their oremap should be correct....

You got another idea to solve that?

Sardaukai commented 3 months ago

Next thing:

  1. Started server with plugin and mod
  2. Client login -> all fine
  3. Drilled some roids and placed a big ship
  4. Client logout -> login 50% bug
  5. Server stop, remove mod
  6. Start server
  7. Client login, ok
  8. Client relog, 50% bug

So there is something else happen, plugin?

Sardaukai commented 3 months ago

BIG UPDATE

!!! Error reproduceable without plugin and mod !!!

EDIT: How to reproduce

  1. Drill one ore more asteroid's
  2. Bring a big ship with you (I need to test more if it is block related etc.)
  3. Have a seat in cryo or cockpit
  4. Relog
  5. 50% Error
StalkR commented 3 months ago

Wow ok, so good news that it's not the plugin/mod, but bad news because it means it's a Keen bug and we can't fix it ourselves.

You can try reporting on https://support.keenswh.com/spaceengineers/pc/ as you've already managed to narrow down the reproducible conditions a fair bit.

It almost sounds like it's a connection issue:

Questions:

  1. during your testing, did you feel like the plugin/mod is making it somehow worse or it has no incidence? I'm wondering if we're onto something with it, which could help find the problem in Keen code.
  2. did you try removing all plugins and mods, removing also Torch, and trying to load the world with the vanilla multiplayer server, and does it reproduce as well? (that's probably what Keen will ask, to be convinced the bug is theirs)

Thanks again for investigating, if there's nothing for us to fix in the plugin/mod then we can close again.

Sardaukai commented 3 months ago

.. Keen bug ... if it is a keen bug I'll write a plugin to patch the code by myself.... done that for a strange respawn issue. Server is so high modded.... keen will not help

"It almost sounds like it's a connection issue" actually I start the client on the dedicated server via rdp, not very playable with 30 fps but 100% connection

Answer 1 Not really, I managed to get a tuned client.cs with some performance boost but I dont think it is a big deal. I'll do a commit later (actually hunting that 50% bug)

Answer 2 50% bug is gone with no mods. (torch + plugins were present)

... still investigating....

StalkR commented 3 months ago

Possible workaround: https://wiki.sigmadraconis.games/en/spaceengies/enduur/playerguidesbugs#respawn-bug-and-infiniti-loading-screen

TL;DR: use pluginloader with

StalkR commented 3 months ago

Tested the new fix, works. Thanks @Sardaukai !

varkh42 commented 3 weeks ago

Hello, we're also facing this issue on our server (GURP) since July. We are using the latest version. It happens to players more often when they are in space than when they aren't, but it's been confirmed by players to happen on the main planet of the systel as well. The description of the issue fits perfectly what people who had the issue have been describing.