Closed fairhat closed 3 years ago
This problem has been tormenting me for quite some time actually. But it becomes extremely difficult to pinpoint the actual problem because I can't consistently reproduce it. Some times I fire up the demo multiple times and everything goes super smooth. Then, without a single change, it happens once or twice then stops again. I do have a suspicion though, and looking at your video makes me believe it even more. Take a look at the frame rate and notice how it's rather inconsistent. The problem happens more often (in the video) when it goes bellow the 60 FPS mark. This makes me believe that the problem happens because the computer is struggling to keep up with the amount of computations. Now, this is just a suspicion and I do have to investigate this more thoroughly. That said, once I finish working on the next addon, among working on a few todos across the pack I want to do a more detailed investigation over this problem.
Actually I don't think this is related to the fps - I can reproduce it on 150 FPS or more. It just drops when recording the screen. I'm guessing this is likely a race condition between cached_input (for replaying unacknowledged inputs) and network inputs (those synchronized with server).
In my demo (attached) i am setting "jump" to true if jump key was pressed (and jump is currently false), if jump key is released i set jump to false. What's happening in my demo is that if i press jump and release it immediately (keeping it pressed for 1-3 ticks) it seems to work perfectly. However if i keep jump pressed (for the whole jump duration) i see the jittering.
Example: Im printing on console whenever jump is set to true (means the last input was jump false) Printing on console gives me this:
JUMP - REPLAY: False - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1428 JUMP - REPLAY: True - signature: 1430
What i see is that jump should only be replayed on signature 1428, however after input 1428 is synchronized the game processes network inputs once (1429) where jump is released (set to false) and then replays 1430 (where jump is for whatever reason again set to true) I tried to draw this in paint (attached) (blue is cached input and purple is network input)
Also: Other connected clients don't see jittering so this is definitely related to client prediction not server side
JUMP - REPLAY: False - signature: 308 DONE - REPLAY: True - signature: 307 JUMP - REPLAY: True - signature: 308 DONE - REPLAY: False - signature: 328 JUMP - REPLAY: True - signature: 327 DONE - REPLAY: True - signature: 328
Replay means handle_input is called from cached inputs. I added a "DONE" print that prints whenever jump is set back to false. What i see is that Signature 308 is already acknowledged by the server but afterwards signature 307 is being replayed on the client, even though it should not be inside the cached inputs anymore
Excellent data! Thank you very much. I will definitely use it when digging into the problem.
I have probably similar issue. I have function like this:
func get_aim_pos(input: InputData) -> Vector2:
var aim_pos: Vector2
if network.is_id_local(player.meta_uid) && input.get_custom_vec2("aim_pos") == Vector2.ZERO:
aim_pos = player.get_global_mouse_position()
if (!network.has_authority()):
input.set_custom_vec2("aim_pos", aim_pos)
else:
aim_pos = input.get_custom_vec2("aim_pos")
if network.has_authority() && !network.is_id_local(player.meta_uid):
print("SERVER %s" % aim_pos)
return aim_pos
and in handle_input I have this:
if input.is_pressed("shoot"):
player.update_aim_target_pos(get_aim_pos(input))
the server does not correctly load get_custom_vec2:
SERVER (375.885376, 736.281677)
SERVER (378.252563, 739.212219)
SERVER (379.519257, 744.551636)
SERVER (383.355347, 749.089783)
SERVER (0, 0)
SERVER (397.936462, 778.334717)
SERVER (0, 0)
SERVER (415.060699, 816.592102)
SERVER (418.820496, 831.391785)
SERVER (0, 0)
SERVER (424.939911, 854.044006)
SERVER (0, 0)
SERVER (426.708221, 860.483948)
SERVER (426.467438, 860.59613)
SERVER (427.544708, 860.974548)
SERVER (430.010193, 861.748169)
SERVER (0, 0)
SERVER (435.679749, 862.07251)
SERVER (438.343384, 862.682922)
SERVER (441.356079, 862.423462)
SERVER (445.40686, 862.388794)
SERVER (0, 0)
SERVER (444.175964, 862.749451)
SERVER (446.999359, 862.991272)
SERVER (448.716583, 862.330017)
not sure why there are the 0, 0 and those are causing jittering issues for me because It is constantly triggering correction
not sure why there are the 0, 0 and those are causing jittering issues for me because It is constantly triggering correction
I noticed that the server sets a custom input vec2 to (0, 0) if there was no input in that physics tick (likely a bug)
For now you could just ignore the (0, 0) values of your aim_position: @yuraj11
func apply_snapshot(new_state):
if state.aim_pos == Vector2.ZERO:
# take last known position instead
pass
That's possible workaround but correction updates will still keep incrementing.
@yuraj11 Even if you ignore Vector2.ZERO when creating the snapshot (on both client and server) ? They should not be different in that case
I noticed that the server sets a custom input vec2 to (0, 0) if there was no input in that physics tick (likely a bug)
This is not a bug. The server has to set something because input must be processed on its end. If there was no input, with current code it's rather difficult to determine if it was caused by data loss/extra delay or because the client is actually not sending any input data at all.
If the problem is caused by "data" loss or inconsistency in the packed delivery speed the I will probably have to make the server delay a little bit before starting to process input data in order to create a "healthy buffer" to work with.
That said, I still have to find one way to consistently replicate this problem because it happens ridiculously rare in here, making it rather difficult to even begin trying to pinpoint the source of the problem. Now @fairhat , are you also correcting the state of the local snapshots after replaying the simulation after a correction? I have added this in a recent update in the code and it seems the jitter became even more rare.
As for @yuraj11's problem, I believe it has to do with the bug that I was performing the wrong boolean operation to detect when to encode and when to not encode custom input data (the issue #15 ), although I cannot be sure until after the new code is tested.
Yes I have observed that on start the issue is more visible then it somehow stabilizes but still I am getting 0,0. It's simple on client I have something like this (simplified):
var aim_position: Vector2
func _physics_process(delta: float) -> void:
if corrected_state:
aim_position = corrected_state.aim_position
replay_input(delta) # as in tutorial calls handle_input & correct_in_snapshot
var input: InputData = network.get_input(meta_uid)
handle_input(input, delta)
network.snapshot_entity(PlayerSnapshot.new(meta_uid, meta_chash).from_node(self))
func handle_input(input: InputData, delta: float) -> void:
if (!input):
return
# ... movement is_pressed("left) etc.
# shooting logic
if input.is_pressed("shoot"):
aim_position = get_aim_pos(input)
func get_aim_pos(input: InputData) -> Vector2:
var aim_pos: Vector2
# == Vector2.ZERO means that input buffer is already set (correction) - take this value and apply
if network.is_id_local(player.meta_uid) && input.get_custom_vec2("aim_pos") == Vector2.ZERO:
aim_pos = player.get_global_mouse_position()
if (!network.has_authority()):
input.set_custom_vec2("aim_pos", aim_pos)
else:
aim_pos = input.get_custom_vec2("aim_pos")
return aim_pos
when I am still setting set_custom_vec2 on client - the server returns sometimes 0,0 and mainly when I start the game It take a time when it stabillizes but still happens.
Ok this will sound weird but when I shake the window (drag It and move a bit around) then most of the issues are gone (mainly when syncing mouse position). This is probably somehow related to FPS/vsync something like that.
The drop in FPS happens when I drag the window:
So It is definitely somehow related with FPS.
That's indeed weird. So, before the FPS drop things are not working as desired. Then, after that things are working correctly? I honestly have no clue on what is going here!
When I spawn in game and then move around with mouse in game it is causing correction buffer to increment until I stop moving with mouse (with pressed mouse button), when I do this right after this it still happens but when I move window around in between then it would not increment the correction buffer.
It works perfectly when I shake window a bit then corrections are very rare then :D
Really, really bizarre! While I'm investigating the issue related to the custom property broadcast thing, I'm also trying to find a way to consistently replicate this jittering. On the various amounts of tests, I only saw the problem occur once, even with a bunch of corrections happening! As I have said, this problem is tormenting me for way too much time!
That said, I still have to find one way to consistently replicate this problem because it happens ridiculously rare in here, making it rather difficult to even begin trying to pinpoint the source of the problem. Now @fairhat , are you also correcting the state of the local snapshots after replaying the simulation after a correction? I have added this in a recent update in the code and it seems the jitter became even more rare.
Using correct_in_snapshot() function for a few days now, works amazing! No jittering at all (except for very high latency, which is what should happen in that case anyways)
For my project i can say that at first i had extreme jittering before using snapshot correction. After i started using it, i still had some jittering but found out that was actually my code that was causing it. It takes some time getting used to the different kind of thinking when using inputs that are synced over the network.
Now i get a jittering every 2 days or so but its almost always related to changes in my code. Will report if i can find something related to the library again.
I have pinpointed the issue and it is related to how set_custom_vec2/get_custom_vec2 works. I think there's something missing when should be retrieved the value and set. Normal actions work fine only those custom are causing the issues. The get_input in playenode calls _poll_input and _dispatch_input_data but later when you modify the custom input values it is late and causes the issue.
EDIT: Definitely this issue is related to custom input. I have made experiment and hardcoded this in _poll_input
:
if !network.has_authority():
if Input.is_action_pressed("shoot") && player:
retval.set_custom_vec2("aim_pos", player.get_global_mouse_position())
and later in code in handle_input I am calling only:
input.get_custom_vec2("aim_pos")
and It works fine now. I think there's some race condition and that could explain the weird behavior with dragging window.
Now i get a jittering every 2 days or so but its almost always related to changes in my code. Will report if i can find something related to the library again.
Nice! Thanks!
The get_input in playenode calls _poll_input and _dispatch_input_data but later when you modify the custom input values it is late and causes the issue.
Ahh! Excellent! Will see what I can do to make things more consistent in this regard!
Ok. Just pushed a tiny change to delay input data dispatching. It should give enough time for it to be properly setup.
Thanks it works correctly now :) I think you can close this issue.
Excellent! I thank you guys for helping finding bugs! :)
In your mega demo i noticed that jumping (and predicting it) works fine as long as no other button is pressed.
However if you jump while moving with 2 keys (lets say W and D) the prediction seems to cause jittering. In my own game prototype i can see that replaying is triggered many times when the server has to process many inputs while it is only called 1-2 times when i only press jump (and nothing else) Could this be related to move_and_slide?
You can see it in the attached video: Addons (DEBUG) 09.07.2020 14_12_59.mp4.zip