Mega Demo / Network: Predicting inputs jittering

fairhat commented 4 years ago

In your mega demo i noticed that jumping (and predicting it) works fine as long as no other button is pressed.

However if you jump while moving with 2 keys (lets say W and D) the prediction seems to cause jittering. In my own game prototype i can see that replaying is triggered many times when the server has to process many inputs while it is only called 1-2 times when i only press jump (and nothing else) Could this be related to move_and_slide?

You can see it in the attached video: Addons (DEBUG) 09.07.2020 14_12_59.mp4.zip

Kehom commented 4 years ago

This problem has been tormenting me for quite some time actually. But it becomes extremely difficult to pinpoint the actual problem because I can't consistently reproduce it. Some times I fire up the demo multiple times and everything goes super smooth. Then, without a single change, it happens once or twice then stops again. I do have a suspicion though, and looking at your video makes me believe it even more. Take a look at the frame rate and notice how it's rather inconsistent. The problem happens more often (in the video) when it goes bellow the 60 FPS mark. This makes me believe that the problem happens because the computer is struggling to keep up with the amount of computations. Now, this is just a suspicion and I do have to investigate this more thoroughly. That said, once I finish working on the next addon, among working on a few todos across the pack I want to do a more detailed investigation over this problem.

fairhat commented 4 years ago

Actually I don't think this is related to the fps - I can reproduce it on 150 FPS or more. It just drops when recording the screen. I'm guessing this is likely a race condition between cached_input (for replaying unacknowledged inputs) and network inputs (those synchronized with server).

In my demo (attached) i am setting "jump" to true if jump key was pressed (and jump is currently false), if jump key is released i set jump to false. What's happening in my demo is that if i press jump and release it immediately (keeping it pressed for 1-3 ticks) it seems to work perfectly. However if i keep jump pressed (for the whole jump duration) i see the jittering.

Example: Im printing on console whenever jump is set to true (means the last input was jump false) Printing on console gives me this:

JUMP - REPLAY: False - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1430

What i see is that jump should only be replayed on signature 1428, however after input 1428 is synchronized the game processes network inputs once (1429) where jump is released (set to false) and then replays 1430 (where jump is for whatever reason again set to true) I tried to draw this in paint (attached) (blue is cached input and purple is network input)

MinimalDemo.zip movement_bug

fairhat commented 4 years ago

Also: Other connected clients don't see jittering so this is definitely related to client prediction not server side

fairhat commented 4 years ago

JUMP - REPLAY: False - signature: 308 DONE - REPLAY: True - signature: 307 JUMP - REPLAY: True - signature: 308 DONE - REPLAY: False - signature: 328 JUMP - REPLAY: True - signature: 327 DONE - REPLAY: True - signature: 328

Replay means handle_input is called from cached inputs. I added a "DONE" print that prints whenever jump is set back to false. What i see is that Signature 308 is already acknowledged by the server but afterwards signature 307 is being replayed on the client, even though it should not be inside the cached inputs anymore

Kehom commented 4 years ago

Excellent data! Thank you very much. I will definitely use it when digging into the problem.

yuraj11 commented 3 years ago

I have probably similar issue. I have function like this:

func get_aim_pos(input: InputData) -> Vector2:
    var aim_pos: Vector2
    if network.is_id_local(player.meta_uid) && input.get_custom_vec2("aim_pos") == Vector2.ZERO:
        aim_pos = player.get_global_mouse_position()
        if (!network.has_authority()):
            input.set_custom_vec2("aim_pos", aim_pos)
    else:
        aim_pos = input.get_custom_vec2("aim_pos")
        if network.has_authority() && !network.is_id_local(player.meta_uid):
            print("SERVER %s" % aim_pos)
    return aim_pos

and in handle_input I have this:

if input.is_pressed("shoot"):
    player.update_aim_target_pos(get_aim_pos(input))

the server does not correctly load get_custom_vec2:

SERVER (375.885376, 736.281677)
SERVER (378.252563, 739.212219)
SERVER (379.519257, 744.551636)
SERVER (383.355347, 749.089783)
SERVER (0, 0)
SERVER (397.936462, 778.334717)
SERVER (0, 0)
SERVER (415.060699, 816.592102)
SERVER (418.820496, 831.391785)
SERVER (0, 0)
SERVER (424.939911, 854.044006)
SERVER (0, 0)
SERVER (426.708221, 860.483948)
SERVER (426.467438, 860.59613)
SERVER (427.544708, 860.974548)
SERVER (430.010193, 861.748169)
SERVER (0, 0)
SERVER (435.679749, 862.07251)
SERVER (438.343384, 862.682922)
SERVER (441.356079, 862.423462)
SERVER (445.40686, 862.388794)
SERVER (0, 0)
SERVER (444.175964, 862.749451)
SERVER (446.999359, 862.991272)
SERVER (448.716583, 862.330017)

not sure why there are the 0, 0 and those are causing jittering issues for me because It is constantly triggering correction

fairhat commented 3 years ago

not sure why there are the 0, 0 and those are causing jittering issues for me because It is constantly triggering correction

I noticed that the server sets a custom input vec2 to (0, 0) if there was no input in that physics tick (likely a bug)

fairhat commented 3 years ago

For now you could just ignore the (0, 0) values of your aim_position: @yuraj11

func apply_snapshot(new_state): 
    if state.aim_pos == Vector2.ZERO: 
        # take last known position instead 
        pass

yuraj11 commented 3 years ago

That's possible workaround but correction updates will still keep incrementing.

fairhat commented 3 years ago

@yuraj11 Even if you ignore Vector2.ZERO when creating the snapshot (on both client and server) ? They should not be different in that case

Kehom commented 3 years ago

I noticed that the server sets a custom input vec2 to (0, 0) if there was no input in that physics tick (likely a bug)

This is not a bug. The server has to set something because input must be processed on its end. If there was no input, with current code it's rather difficult to determine if it was caused by data loss/extra delay or because the client is actually not sending any input data at all.

If the problem is caused by "data" loss or inconsistency in the packed delivery speed the I will probably have to make the server delay a little bit before starting to process input data in order to create a "healthy buffer" to work with.

That said, I still have to find one way to consistently replicate this problem because it happens ridiculously rare in here, making it rather difficult to even begin trying to pinpoint the source of the problem. Now @fairhat , are you also correcting the state of the local snapshots after replaying the simulation after a correction? I have added this in a recent update in the code and it seems the jitter became even more rare.

As for @yuraj11's problem, I believe it has to do with the bug that I was performing the wrong boolean operation to detect when to encode and when to not encode custom input data (the issue #15 ), although I cannot be sure until after the new code is tested.

yuraj11 commented 3 years ago

Yes I have observed that on start the issue is more visible then it somehow stabilizes but still I am getting 0,0. It's simple on client I have something like this (simplified):

var aim_position: Vector2

func _physics_process(delta: float) -> void:
 if corrected_state:
   aim_position = corrected_state.aim_position
   replay_input(delta) # as in tutorial calls handle_input & correct_in_snapshot

 var input: InputData = network.get_input(meta_uid)
 handle_input(input, delta)
 network.snapshot_entity(PlayerSnapshot.new(meta_uid, meta_chash).from_node(self))

func handle_input(input: InputData, delta: float) -> void:
 if (!input):
   return

 # ... movement is_pressed("left) etc.

# shooting logic
 if input.is_pressed("shoot"):
  aim_position = get_aim_pos(input)

func get_aim_pos(input: InputData) -> Vector2:
    var aim_pos: Vector2
        #  == Vector2.ZERO means that input buffer is already set (correction) - take this value and apply
    if network.is_id_local(player.meta_uid) && input.get_custom_vec2("aim_pos") == Vector2.ZERO:
        aim_pos = player.get_global_mouse_position()
        if (!network.has_authority()):
            input.set_custom_vec2("aim_pos", aim_pos)
    else:
        aim_pos = input.get_custom_vec2("aim_pos")
    return aim_pos

when I am still setting set_custom_vec2 on client - the server returns sometimes 0,0 and mainly when I start the game It take a time when it stabillizes but still happens.

yuraj11 commented 3 years ago

Ok this will sound weird but when I shake the window (drag It and move a bit around) then most of the issues are gone (mainly when syncing mouse position). This is probably somehow related to FPS/vsync something like that.

The drop in FPS happens when I drag the window:

So It is definitely somehow related with FPS.

Kehom commented 3 years ago

That's indeed weird. So, before the FPS drop things are not working as desired. Then, after that things are working correctly? I honestly have no clue on what is going here!

yuraj11 commented 3 years ago

When I spawn in game and then move around with mouse in game it is causing correction buffer to increment until I stop moving with mouse (with pressed mouse button), when I do this right after this it still happens but when I move window around in between then it would not increment the correction buffer.

yuraj11 commented 3 years ago

It works perfectly when I shake window a bit then corrections are very rare then :D

Kehom commented 3 years ago

Really, really bizarre! While I'm investigating the issue related to the custom property broadcast thing, I'm also trying to find a way to consistently replicate this jittering. On the various amounts of tests, I only saw the problem occur once, even with a bunch of corrections happening! As I have said, this problem is tormenting me for way too much time!

fairhat commented 3 years ago

That said, I still have to find one way to consistently replicate this problem because it happens ridiculously rare in here, making it rather difficult to even begin trying to pinpoint the source of the problem. Now @fairhat , are you also correcting the state of the local snapshots after replaying the simulation after a correction? I have added this in a recent update in the code and it seems the jitter became even more rare.

Using correct_in_snapshot() function for a few days now, works amazing! No jittering at all (except for very high latency, which is what should happen in that case anyways)

For my project i can say that at first i had extreme jittering before using snapshot correction. After i started using it, i still had some jittering but found out that was actually my code that was causing it. It takes some time getting used to the different kind of thinking when using inputs that are synced over the network.

Now i get a jittering every 2 days or so but its almost always related to changes in my code. Will report if i can find something related to the library again.

yuraj11 commented 3 years ago

I have pinpointed the issue and it is related to how set_custom_vec2/get_custom_vec2 works. I think there's something missing when should be retrieved the value and set. Normal actions work fine only those custom are causing the issues. The get_input in playenode calls _poll_input and _dispatch_input_data but later when you modify the custom input values it is late and causes the issue.

EDIT: Definitely this issue is related to custom input. I have made experiment and hardcoded this in _poll_input:

if !network.has_authority():
 if Input.is_action_pressed("shoot") && player:
   retval.set_custom_vec2("aim_pos", player.get_global_mouse_position())

and later in code in handle_input I am calling only:

input.get_custom_vec2("aim_pos")

and It works fine now. I think there's some race condition and that could explain the weird behavior with dragging window.

Kehom commented 3 years ago

Now i get a jittering every 2 days or so but its almost always related to changes in my code. Will report if i can find something related to the library again.

Nice! Thanks!

The get_input in playenode calls _poll_input and _dispatch_input_data but later when you modify the custom input values it is late and causes the issue.

Ahh! Excellent! Will see what I can do to make things more consistent in this regard!

Kehom commented 3 years ago

Ok. Just pushed a tiny change to delay input data dispatching. It should give enough time for it to be properly setup.

yuraj11 commented 3 years ago

Thanks it works correctly now :) I think you can close this issue.

Kehom commented 3 years ago

Excellent! I thank you guys for helping finding bugs! :)

Kehom / GodotAddonPack

Mega Demo / Network: Predicting inputs jittering #8