Closed ezoray closed 1 month ago
@ezoray Ok, this was a very good catch and a bit tricky to prevent. I went ahead and pulled down your repo to use your examples (nice btw! 👍 ) and came up with a solution for this that prevents local modification of a collection with still a few minor caveats.
The underlying issue is that default .NET collections have no way to be notified when the collection is modified and since we have already committed to supporting these types it is a bit of a catch 22 scenario:
NetworkVariable
property of every NetworkBehaviour
component (very bad for performance reasons) and since only the client/server with write permissions can mark something dirty that then adds the NetworkObject
to a post late update queue that is then polled for changes to NetworkVariable
properties (i.e. clients without write permissions will never have this check), I had to come up with a way to provide developers a mechanism to check for changes on non-authority instances.The solution (thus far) is to keep one more internal duplicate of the NetworkVariable
that is a duplicate of the current value and can be used to check against the current value when:
Due to the underlying issue of standard .NET collections and lack of any notification (without using a modified/alternate version...since we are supporting the standard .NET collections), the best way to handle the scenario where a client could modify a collection without invoking the NetworkVariable.CheckDirtyState was to provide the alternate option of invoking NetworkVariable.IsDirty which reverts any local changes on clients without write permissions and returns true (without actually setting the dirty state) so developers can have some way of detecting if a collection was modified on a client without write permissions. This is only in the event you might want to take some other action and/or generate some other event (if trying to detect cheating or the like), but doesn't require reverting back to the last known current state since it automatically does this for you.
I need to mull over these modifications a bit more and extend the NetworkVariable collections integration tests to validate the modifications... but should have a PR up for this by end of day. Once that is done, I will post the modified version of your project here that will also point to the PR branch so you can test it yourself (in the event you want to do a second pass to make sure it works as you would expect and to see how I was testing using your project).
Also wanted to thank you very much for your efforts in helping us catch these kinds of things and for your awesome project that hopefully others will be able to learn from... super super awesome stuff you have there! :godmode: 👍 💯
Thanks Noel for the explanation, I think I'm getting it. It does seem that having NetworkVariables handling collections is pretty awkward and outside their original remit. I must admit I'm not really liking having the receiver have to do a comparison of the collections to find out what's changed, but as it's the only option I guess I'll have to get used to it :) Please post the modified code when you're done and I'll be happy to test it.
Thanks for your kind words on the project. I hope it's useful to others, it did motivate me to bring some order to a lot of test code scattered across various projects. I've also learned and am still learning things I missed or wouldn't have come across otherwise, so win win. :)
@ezoray
Ok, I am attaching the modified project that starts off with the NetworkVariableDictionaryScene scene. I sort of moved a few things around (I prefer to make stand alone builds as a final pass and added my NetworkManagerHelper that has the connect/disconnect simple UI with the generic/default runtime console logging) and added a CheckDirty toggle that only displays on the non-owner side and when disabled prevents NetworkVariable.CheckDirtyState
from being invoked within InScene.DictionaryUpdate
and uses the NetworkVariable.IsDirty
method instead. The manifest points to the PR's branch.
Let me know if this seems like it works more like you would expect?
Cheers! 😸
I had a look and there is an issue, I think down to the first client interaction with the dictionary after switching ownership to the client. There's different ways to produce it, the dictionary just needs to have multiple entries.
One way, as server add a couple of entries to the dictionary: Dictionary - key: 0 value: 10 Dictionary - key: 1 value: 10
Switch ownership to client, then add another entry to the dictionary: Clients dictionary, count 3: Dictionary - key: 0 value: 10 Dictionary - key: 1 value: 10 Dictionary - key: 2 value: 10
Server throws exception ArgumentException: An item with the same key has already been added. Key: 1 I think the key in the exception is the last entry that the server added before switching ownership. Server dictionary count remains 2.
It may be related to the first call of CheckDirtyState on the client after ownership changes to the client. I noticed if no change has been made to the dictionary and CheckDirtyState is called it triggers OnValueChanged with the previous dictionary being empty.
I should add this looks to be a separate issue and the permission changes seem to work fine. :)
@ezoray Very good catch!
Using the same modified project provided, delete the packages-lock.json
file to get the most recent update. The replication is specific to the scenario where the client is already connected, then the entries are added, then ownership transfers to the client, and finally the new client owner adds an entry. Under this scenario, the client does not update its previous value (as you pointed out) which upon the client making a change it sends the last/previous entry added and the newly added entry causing the key already exists exception.
The most recent push to the PR should resolve this issue. It basically assure that any owner write permission NetworkVariable
has its previous and original value updated to mirror the current value.
👍
Looks fixed to me. 👍
The only other issue is one I mentioned previously. While entries are being added constantly to the dictionary by the server, when the client connects they get the contents on spawn but also a duplicate of the last entry resulting in an exception. This was happening on ownership change as well but I haven't checked this with the modified project code.
While entries are being added constantly to the dictionary by the server, when the client connects they get the contents on spawn but also a duplicate of the last entry resulting in an exception. This was happening on ownership change as well but I haven't checked this with the modified project code.
Yeah, this is indeed a different issue and I have some changes in my local branch for this PR that resolve the dictionary issue. It is looking like it could be an order of operations issue...and my current fix is to handle it deep within the dictionary collection serialization... but that sort of masks the core issue where you could receive an update for a value that is already set... (thinking of lists primarily where you could potentially get duplicate entries since that is valid).
Going to look at this more tomorrow and if it looks like I need to spend a bit more time on it then I will split that one specific issue off to a separate internal ticket and make a new PR for that fix.
@ezoray Ok, I narrowed down the root cause of the issue. When initially synchronizing a client it was sending the current value of each NetworkVariable even if there were pending changes. This would then cause the key already exists exception on the newly connected client side when the host would send the pending changes on the next tick.
The proper fix (high level overview) was to check if there were pending changes (per NetworkVariable) when writing the full synchronization information to the client and if there were then writing the previously known value as opposed to the current value would synchronize the client to the same state as the rest of the connected clients. Some short time later (default would be on the next tick) the changes are sent to the newly connected client and applied (at the same time any other connected clients would be updated to the changes).
This adjustment should resolve the 2nd issue for any collection type supported. So, I reverted out the dictionary collection serialization (which is sort of the "canary in the bird cage" since it throws the key already exists exception if there is some form of duplicated sync issue). For HashSets and Lists, they are silent since a HashSet won't add duplicate item and a List will just add a duplicated entry without any exceptions or the like.
Will write an integration test for this update to the existing PR. Your included custom package I imported into the modified version of the NGOToGo project and it works just fine. 👍
@ezoray Ok.. I actually ran into several issues along the way and ended up with a handful of fixes. Here is the updated version of the modified NGOToGo project with the 2nd issue included as well as a bootstrap menu so you can load both issues in one pass without having to change the secenes in build index order and the like: NGOTOGO_CollectionsFix_II.zip
One of the additions is a notification that will be generated if you set the NetworkManager logging mode to developer and you change ownership of the same NetworkObject under a 6x the tick rate frequency period of time. It still allows the change but I decided that a warning vs a clamp was a better approach... the general idea is that due to the nature of how NetworkVariables are updated and the time it can take to handle processing the change in ownership (client-server network topology) there is a scenario where you can send the message from the server side prior to the client having already processed an in-flight change in ownership message...and you are changing the NetworkVariable (collections) when ownership is gained.
So, the rule of thumb here is don't repeatedly change ownership of the same NetworkObject
within a really short period of time if you know there are collections that need to be synchronized properly and you are changing network variable state upon the gain of ownership (an edge case for sure, but it is something that can cause issues especially using a client-server network topology).
So...on issue-2 if you rapidly change ownership with higher latencies then you "could" have a synchronization issue.
You can start two local stand alone builds of the included updated NGOToGo project and click as fast as you can and shouldn't see any issue... but under more latent conditions you could potentially see an invalid synchronization if you are changing the ownership of the same NetworkObject
with NetworkVariables that are collections (primarily List and Dictionary).
However, setting that minor edge case aside... you should see both issues are resolved.
👍 👍
Looks good Noel, the connection issue is gone.
I changed the Update loop condition so the spam is continuous and not restricted to the server:
private void Update()
{
if (IsSpawned && IsOwner)
{
netDictionary.Value.Add(keyValue.Value, keyValue.Value);
keyValue.Value++;
netDictionary.CheckDirtyState();
}
}
It looks like changing ownership to the client is fine but switching it back to the server there's errors on both sides. Going by what you said above is this something that can be handled?
Edit: To clarify after further testing, server spamming can be turned off so only the client spams the dictionary, the following errors occur when switching ownership back to server:
Client: multiple ArgumentException: An item with the same key has already been added. Key: n
Server: [Netcode] Client wrote to NetworkVariable`1 without permission. No more variables can be read. This is critical. => NetworkObjectId: 1 - GetNetworkBehaviourOrderIndex(): 0 - VariableIndex: 0
The errors are broadly similar to when the server also spams the dictionary.
@ezoray
So with owner write permissions and how you have it setup you will need to add some form of change in ownership synchronization into the mix as you are indeed seeing the issue where you can change ownership with "in-flight" state update messages. Since there is a delay (at least 1/2 RTT) between the host/server sending the change in ownership message and the clients while the host/server applies the change immediately (locally), you will run into synchronization issues with owner write permission based NetworkVariable
collections for the 2nd issue use case scenario.
Here is a modified version of your project that uses the same fixes (in this issue's assigned PR) but also incorporates a tick delay between changes in ownership as well as it uses RPCs to synchronize any ownership changes that are client to host or to another client:
NGOTOGO_CollectionsFix_III.zip
It works with multiple clients connected and provides some additional properties on the "InScene" NetworkObject: The ownership tick delay is the amount of time the owner waits before allowing a transfer of ownership. When it is transitioning from the host to a client it handles this locally. When it is transitioning from a client to the host or another client it uses the RPCs to synchronize the transfer of ownership. The Update Rate is how often a client/host will update the dictionary in the update loop. I have tried various settings and it works (locally) with the update rate set to 1 and the tick delay set to the minimum of 3 (my recommendation) ... you could probably shift the minimum down to 1 or 2 but if you introduce normal latency then you could see some issues at that range (32-66ms). My recommendation would be to keep it at least at a 3 tick delay (~100ms) when transferring ownership to assure all state updates have been received and processed (or will be before the change in ownership occurs).
It all really depends upon the use case and what you are trying to accomplish with this kind of use case scenario...
Either case, these modifications to your project should provide you with a reasonable example. I also added some comments to make it easier to digest what all is happening.
Let me know if these adjustments make sense to you and if (with the current fixes in place) it resolves your two issue.
This works a treat. I guess the take away is to be cautious when switching ownership and be aware of potential issues with in-flight messages and code around them if necessary.
Thanks for taking the time to create these examples, they've been very useful and informative. :)
We will continue to make improvements... but for now this would be the recommended way to handle this.
Understood.
I hope you don't mind but can I bring another issue to your attention: https://github.com/Unity-Technologies/com.unity.netcode.gameobjects/issues/3059 It's been closed but it may be an issue that needs addressing.
Understood.
I hope you don't mind but can I bring another issue to your attention: #3059 It's been closed but it may be an issue that needs addressing.
Sam is looking at it and has a fix already locally. I will touch base with him to see where he is on that (you can expect that fix to be included in the same update as this PR).
As a side note:
I just pushed some more fixes that further improves the stability for updating collections and includes a modification to NetworkVariableDeltaMessage
that now allows a server to immediately forward delta state updates (to valid clients on a per NetworkVariable
field basis) without having to queue them for end of frame (i.e. since messages are processed at the beginning of a frame, this shaves off the state replication time to other clients by close to the total frame time...which depending upon your project could mean 10-16ms faster).
The fixes in #3081 should be in the up-coming (it is being processed) NGO v2.1.0 update. 👍
Awesome, thanks Noel.
Description
Reproduce Steps
The first issue can be replicated running the code here. To see the second issue run the code in the attached unitypackage.
Actual Outcome
Environment