SafeteeWoW commented 6 years ago

https://media-elerium.cursecdn.com/attachments/220/817/rclootcouncil.txt Disconnection happens on entry 309, but the lootTable is received on entry 98. About 200 addon messages are received within 1s.

I highly suspect the disconnection is caused by burst traffic due to the following reasons:

It seems only happens in large raid. More likely in 30man raid. No reports for small raid.
It only happens when loot frame pops up when there are lots of traffic happening.
Happens randomly, and does not happen for everyone. I think it is because some server is has lower data threshold than the others.

http://wowwiki.wikia.com/wiki/ChatThrottleLib

The World of Warcraft servers will disconnect you if you generate too much output. This is not limited to chat output; any type of output can cause it. Some of you may have noticed that holding down your right mouse button and wiggling it wildly for an extended time can kick you off.

I suppose receiving message also generates output because WoW client generates ACK for each message? Anyway, it's highly likely that the burst traffic when the lootTable received causes the issue.

Why the problem arises in this version but not the older version?

This version increases the traffic a bit because non-autopass people sends their gear immediately.
it's possible that Blizzard lower the disconnection threshold to handle the much more online people in the new raid.
30man raid only occurs at the start of tier. People don't 30man raid at the end tier. Disconnection shouldn't happen in small raid.

RCLootCouncil3 will reduce the traffic by:

Compress data: All information fetch-able client side in itemLink will be discarded, such as quality color and item name
Combine short commands into large one. If short commands are sent together, we should combine them into one because more commands=more overhead. For example, combines several response command into multi_response command.

I know this causes delay when all items are distributed later, but distributes lots of items takes time. I don't think 30s matters time for distributing 60 items, and there isn't much delay if the items are distributes from the 1st session to the last session

evil-morfar commented 6 years ago

I'm still not certain this is what causes it. It would make sense if it's only that, but there's a few things putting that theory off:

First off, two different people have reported it happens with smaller raids aswell. Half the raid size = half the messages.
It's really odd it only happens to 1 guy per raid. If we really overflowed the cap, then we should expect many more people to dc.
It seems to happen to the same people over and over - I wouldn't call that randomly. Further more it seemingly does not happen to MLs.

v3 doesn't help us now, and it's too far off anyway to be of any help. Besides, there's really not much data to compress in responses which are (seemingly) the current issue. Combining messages doesn't necessarily help either. Messages can't be more than 255 bytes long (including prefix), any longer than that and AceComm splits them. Currently that means lootAcks and the occasional autopass (some are too long to combine 2 of) are the only candidates for that - and lootAck is only sent once anyway.

There's stuff that's much more relevant to remove. Lines like this really breaks my heart:

"21:12:19 - Comm received:^1^Sresponse^T^N1^N2^N2^SSelkouth-Sargeras^N3^T^Silvl^N947.3125^SspecID^N265^t^t^^ (from:) (Selkouth) (distri:) (RAID)", -- [136]

It only sends ilvl and specID - it actually happens somewhat often, and could easily be included in loot ack. 21 of the ~200 messages contained only this.

evil-morfar commented 6 years ago

As for the SV from Tehrible (the only one that actually shows a DC) his ML didn't have the newest version (lootTable lacks classes and subTypeID/typeID (the release version doesn't filter out those)). Are we certain repeated calls to GetItemClassesAllowedFlag can't cause dc's?

evil-morfar commented 6 years ago

Also, each an every response received triggers an RCVotingFrame:Update() which again triggers alot of sorting. That might also have something to do with it.

SafeteeWoW commented 6 years ago

GetItemClassesAllowedFlag. This cant be the issue. ML calls it more repeated. Even /rc test 200 does not cause disconnection

evil-morfar commented 6 years ago

Except the item is uncached - the ML's item is cached when he calls it, and even when testing it won't be called before the items are cached. But it's way more likely some of the other things affects it more.

SafeteeWoW commented 6 years ago

Let me call it with uncached items.

SafeteeWoW commented 6 years ago

First, I delete the WoW cache folder.
Then I use this code to test repeated calls to GetItemClassesAllowedFlag. No caching. https://gist.github.com/SafeteeWoW/e442117c9aa9ab074165d38a4eaad29d print 0xffffffff for all uncached items
Game freezes a bit, but no disconnection.

SafeteeWoW commented 6 years ago

Larger command helps though, it reduces the fragmentation any way

evil-morfar commented 6 years ago

I presume at this point the amount of messages are the cause of it. I just really don't want to add an extra delay without knowing for sure. I'm debating releasing the current fixes without this to see if it helps, or just release this as a bandaid.

evil-morfar commented 6 years ago

Checkout the latest SV https://media-elerium.cursecdn.com/attachments/220/953/rclootcouncil.txt. This one has 3 seconds after a major burst before the DC happens (entry 664) and 4 seconds later (entry 1064). While it still might just be overflowing at that point, it does seem odd. It also shows off how horribly inefficient the reconnect data is.

SafeteeWoW commented 6 years ago

This takes forever to transmit this reconnect data. Too much data easily fetchable client side are transmitted.

SafeteeWoW commented 6 years ago

Got a report that ML can also disconnect

SafeteeWoW commented 6 years ago

I suggest to release this fix. We should first attempt to make the addon work first, then considering the performance impact of this fix.

SafeteeWoW commented 6 years ago

Also, can you have a look at #122?

evil-morfar commented 6 years ago

My only concern by releasing multiple, somewhat potential fixes is we won't know which one actually did it.

SafeteeWoW commented 6 years ago

Then you can delay #124 to the next version. Other changes done are not related.

118 is unrelated because ML can also DC.

PRs with new features are obviously unrelated.

evil-morfar commented 6 years ago

124 is really a kind of major performance enhancement, and should be included asap. I'll merge both of them, as while they might both help, this must be the main reason for the DC's, and a proper fix will most likely not be backwards compatible.

evil-morfar / RCLootCouncil2

Attempt to fix disconnection #120

118 is unrelated because ML can also DC.

124 is really a kind of major performance enhancement, and should be included asap. I'll merge both of them, as while they might both help, this must be the main reason for the DC's, and a proper fix will most likely not be backwards compatible.