Closed maasaimosh closed 5 years ago
Does your custom networking system use multiple threads? This is one of those cases where the return value can't possibly be null, so I suspect multithreading corruption of the data structure.
If you do use multithreading, make sure you're not calling any Dissonance methods (e.g. delivering packets) off the main thread.
It does use exactly one thread. I stopped using the ForgeRemastered networking stack when I encountered the other multi-threading problems a few weeks ago ( #162 ]
However to diagnose, I will take the thread tracking code I put in to debug #162 to this codepath too.
Also as mentioned, this is running on an Android tablet which I have less experience with.
I will post the results here.
The FNR multithreading bugs have been fixed, so if that was the only reason you moved off FNR you could migrate back to that now.
The code you put in place to monitor for multiple threads last time was good, adding it back in is a good idea. If that does get hit my multiple threads try capturing a stacktrace from each unique thread and seeing where they're coming from.
The fact that this is so incredibly rare (requiring several hours in a system handling 40-60 packets per second) makes me think there might be something weirder going on here, although I can't think what.
Okay... the first rush is in...
BasePreprocessingPipeline
is created many times, creating many threads. This is on the Dissonance side.
I think it's something microphone related. I get some restarting messages to do with the microphone and the code trace leads to the suspect CapturePipelineManager.RestartTransmissionPipeline()
The way you can verify this is change the recording device in the Basic Microphone Capture Script
I put in some testing logging.
protected BasePreprocessingPipeline([NotNull] WaveFormat inputFormat, int intermediateFrameSize, int intermediateSampleRate, int outputFrameSize, int outputSampleRate)
{
Log.Error("BasePreprocessingPipeline.cs constructor called. this should happen once");
and
private void ThreadEntry()
{
try
{
Log.Error( "BasePreprocessingPipeline.ThreadEntry() start called. this should happen once" );
Do you have multiple preprocessing pipelines active at once? That shouldn't happen. Could you send me a log, I'll see what those messages are and what the root cause of multiple preprocessors being created is.
However, I don't think that can cause this issue. The voice sending is designed to handle multithreading - voice preprocessing/encoding runs on a background thread and then the packet is put onto a queue and sent by the network system next frame (on the main thread). If you trace through to Assets/Plugins/Dissonance/Core/Networking/Client/VoiceSender
line 300 you'll see that there's a lock protecting the critical datastructures. At the very least this means that two preprocessors that exist at the same time could not access that section at the same time.
I've had a further look this evening after work. New threads are spawned all the time. And your threads are definitely clashing.
1/2 ) Pool.Get() needs protection
The BasePreprocessingPipeline.ThreadEntry()
loop only checks _runThread
at the start of the while loop. So old threads are still somewhere in the innards of the ThreadEntry while loop when a new Thread is started and catches up.
The SendQueuePool._listPool.Get()
call is not covered by a lock. So the code
/// Get an item from this pool
public T Get()
{
if (_items.Count > 0) // Thread A and B both read a 1
return _items.Pop(); // one of Thread A or Thread B get the last entry... Thread Other returns null
The Count and the Pop should be protected in a lock( _items ) group
public T Get()
{
lock( _items )
{
if (_items.Count > 0)
return _items.Pop();
return _factory();
}
}
and also Put() for safety ( less important ).
2/2 ) Why so many threads ? Just an implementation question ?
I guess I've got to ask why so many threads are being spawned as well.
During my testing this is happening all the time. It seems a frame skip will 'force capture pipeline reset'. This makes a new thread.
Here is a transcript where everything blows up after a frame skip and new thread:
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Recording] (20:38:19.638) CapturePipelineManager: Detected a frame skip, forcing capture pipeline reset (Delta Time:0.1673068)
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Recording] (20:38:19.824) BasicMicrophoneCapture: Began mic capture (SampleRate:44100Hz, FrameSize:882, Buffer Limit:2^13, Latency:20ms, Device:'')
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Recording] (20:38:19.848) BasePreprocessingPipeline: BasePreprocessingPipeline.cs constructor called. this should happen once
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Recording] (20:38:19.818) BasePreprocessingPipeline: BasePreprocessingPipeline.ThreadEntry() ended: 7
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Recording] (20:38:19.871) BasePreprocessingPipeline: BasePreprocessingPipeline.ThreadEntry() started: 8
[Dissonance:Playback] (20:38:20.431) EncodedAudioBuffer: Error: Encoded audio heap is getting very large (50 items)! This is probably a bug in Dissonance, we're sorry! Please report the bug on the issue tracker "https://github.com/Placeholder-Software/Dissonance/issues". You could also seek help on the community at "http://placeholder-software.co.uk/dissonance/community" to get help for a temporary workaround. Error ID: 59EE0102-FF75-467A-A50D-00BF670E9B8C
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Network] (20:38:20.202) SendQueue`1: ServerRelay.ProcessPacketRelay(): New thread id encountered: 8
<i>AndroidPlayer(Amazon_XXXXX@192.168.xxx.xxx)</i> [Dissonance:Network] (20:38:20.204) SendQueue`1: SendQueue.EnqueueP2P(): Switch thread id [ 7 ] to [ 8 ]
Stack Trace:
at Dissonance.Networking.Client.SendQueue`1[[PeerInfo].EnqueueP2P(UInt16 localId, ICollection`1 destinations, ICollection`1 queue, ArraySegment`1 packet)
The synchronisation happens at a higher level than you're looking in the CapturePipelineManager (Assets/Plugins/Dissonance/Core/Audio/Capture/CapturePipelineManager.cs) which is responsible for managing pipelines and their life time. When a full reset happens it stops the current capture pipeline, joins on the thread and then starts a new pipeline (and a new thread).
That's what I was saying before about how there shouldn't be two threads running in this code at the same time - have you actually caught two preprocessor instances running concurrently? If so then that's definitely a bug that needs fixing!
Edit: As a quick fix, you could try swapping out the _listPool
from a Pool<T>
to a ConcurrentPool<T>
.
In a perfect situation Dissonance spawns just one thread to run preprocessing/encoding.
However, the Unity microphone API requires that we capture mic input on the main thread so the mic input is quite sensitive to long frames - too slow and the audio system starves for input. The frame skip detector is a last ditch attempt to recover from overwhelmingly bad frame times (e.g. a single frame took 167ms in that log sample), it tears down the entire pipeline and recreates it in a fresh state (which spawns a new thread).
Of course this should happen quite rarely, do you have a lot of these frame skip messages?
Edit: If this is the cause then this would probably explain why it happens so rarely (resets happen relatively rarely compared to how often a single voice packet is sent).
As a precaution I've rewritten SendQueue.cs
(Assets/Plugins/Dissonance/Core/Networking/Client/SendQueue.cs) to use locked values around all of the non-threadsafe structures, this shouldn't be needed but I thought you may find it useful to try this. If it solves your issue then I'll at least have an idea where to look for the root problem (and you'll be able to get on with building your app!).
https://gist.github.com/martindevans/25783517dba4ab2d31283be96c0d0e6b
With Dissonance 6.4.3 I hardened the networking system against multithreading bugs, hopefully this may mitigate your issue :)
Okay... I will upgrade and evaluate
I have left the 6.4.3 codebase running now for a long time. This problem has not re-occurred, so I reckon it's closed.
Thanks for the changes !
Fantastic, thanks for confirming that :)
Context
Long running voip session ( multiple hours ) gets NRE in SendQueue.cs
Expected Behavior
Long running voip session ( multiple hours ) does not produce exceptions
Actual Behavior
NullReferenceException in SendQueue
Example Stacktrace:
Source code line:
In one long testrun, this error occurred 4 times.
Note the timestamps: 22:40:46.138 , 23:05:35.710 , 23:34:50.739 , 00:39:42.481
After the 4th error, there developed a follow on problem:
The warning: EncodedAudioBuffer: Voice Error: Encoded audio heap is getting very large (80 items)! Error ID: 59EE0102-FF75-467A-A50D-00BF670E9B8C
then just kept being repeated and repeated in the logs
Workaround
None found
Steps to Reproduce
Provide a detailed set of steps to reproduce the problem
Your Environment
Amazon fire tablet 7 with Dissonance 6.4.2 Home written networking stack that works PC application to run the Dissonance server also with the homewritten stack
6.4.2
2017.4.28
Windows 10 x64
Development build deployed to Android tablet, then consol logs captured via usb cable into Unity.