JvanKatwijk / dab-cmdline

DAB decoding library with example of its use
GNU General Public License v2.0
57 stars 29 forks source link

dabProcessor::reset sometimes terminates the app #97

Open nlitsme opened 1 year ago

nlitsme commented 1 year ago

I noticed in sdrangel, which uses a fork of dab-cmdline. that occasionally, when switching channel, the app crashes.

looking into this, this is in the thread-assignment operator, which checks if the current thread is still joinable ( -> it is still running ) and then terminates the app.

Two threads are terminated simultaneously:

other threads running:

I don't have a debug build, so I can't tell if these are all for the same object.

Looking at dabProcessor, the stop() method seems suspect. I think the sleep(1) call looks like an attempt to work around a synchronisation problem. Which may be the root cause.

willem

JvanKatwijk commented 1 year ago

I am not familiar with SDRangel so I need some more information, In particular the control program dab-cmdline comes with a sumber of example programs that - as clearly stated - are example program and need to be correct

Op do 5 okt 2023 om 12:41 schreef willem @.***>:

I noticed in sdrangel, which uses a fork of dab-cmdline. that occasionally, when switching channel, the app crashes.

looking into this, this is in the thread-assignment operator, which checks if the current thread is still joinable ( -> it is still running ) and then terminates the app.

Two threads are terminated simultaneously:

  • dabProcessor::start() -> terminate
  • dabProcessor::run() -> mscHandler::start() -> terminate

other threads running:

  • dabProcessor::run() -> sampleReader::getSample(int)
  • mscHandler::run() -> cond_timedwait
  • mscHandler::run() -> cond_timedwait

I don't have a debug build, so I can't tell if these are all for the same object.

Looking at dabProcessor, the stop() method seems suspect. I think the sleep(1) call looks like an attempt to work around a synchronisation problem. Which may be the root cause.

willem

— Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/97, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCPHQARX5LCNNWKC34AB7LX52FFFAVCNFSM6AAAAAA5UBINFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDOOJXGQ2DOMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Jan van Katwijk

srcejon commented 1 year ago

The problem will occur if dabReset() is called twice quickly. Perhaps adding this to one of the examples will show it.

You can see there is a potential problem just by looking at the code for start(), stop() and run() though:

stop() uses 'running' to determine whether to join:

if (running. load ()) {
   running. store (false);
   myReader. setRunning (false);
   sleep (1);
   threadHandle. join ();

}

However, the running variable is only set some lines of code in to run()

void dabProcessor::run (void) { std::complex FreqCorr; timeSyncer myTimeSyncer (&myReader); int32_t i; float fineOffset = 0; float coarseOffset = 0; bool correctionNeeded = true; std::vector<complex> ofdmBuffer (T_null); int dip_attempts = 0; int index_attempts = 0; int startIndex = -1;

  isSynced  = false;
  snr       = 0;
  running. store (true);  //// Set here <----

If reset() is called twice quickly, it's possible that running isn't yet set from the first invocation when stop is called for the second invocation. This means join doesn't get called - and so when the thread destructor is called, terminate() is called as the thread is still joinable.

Unfortunately, just setting running to true in start() isn't enough. That ensures join is called - but the thread sometimes doesn't end.

JvanKatwijk commented 1 year ago

That is because the DAB library was designed for just running a single service in a single channel. Anyway, I'll have a look at it

Op zo 12 nov 2023 om 23:39 schreef srcejon @.***>:

The problem will occur if dabReset() is called twice quickly. Perhaps adding this to one of the examples will show it.

You can see there is a potential problem just by looking at the code for start(), stop() and run() though:

stop() uses 'running' to determine whether to join:

if (running. load ()) { running. store (false); myReader. setRunning (false); sleep (1); threadHandle. join ();

}

However, the running variable is only set some lines of code in to run()

void dabProcessor::run (void) { std::complex FreqCorr; timeSyncer myTimeSyncer (&myReader); int32_t i; float fineOffset = 0; float coarseOffset = 0; bool correctionNeeded = true; std::vector ofdmBuffer (T_null); int dip_attempts = 0; int index_attempts = 0; int startIndex = -1;

isSynced = false; snr = 0; running. store (true); //// Set here <----

If reset() is called twice quickly, it's possible that running isn't yet set from the first invocation when stop is called for the second invocation. This means join doesn't get called - and so when the thread destructor is called, terminate() is called as the thread is still joinable.

Unfortunately, just setting running to true in start() isn't enough. That ensures join is called - but the thread sometimes doesn't end.

— Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/97#issuecomment-1807265530, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCPHQH4ICAQYGSIZXG3W73YEFFZ3AVCNFSM6AAAAAA5UBINFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBXGI3DKNJTGA . You are receiving this because you commented.Message ID: @.***>

-- Jan van Katwijk

srcejon commented 1 year ago

That is because the DAB library was designed for just running a single service in a single channel.

I believe that is all that we are doing. What I think happens is the following:

I guess I could add some code to prevent that - but calling dabReset() twice in succession didn't seem unreasonable.

JvanKatwijk commented 1 year ago

OK As said, the original intent was a program to run a single service in a single channel.

Anyway, the issue was apparently the interface program mscHandler that was implemented to run in a separate thread. That was done 6 years ago to have the program running on an RPI2, it helped distribuing the load caused by the nearly 1000 FFT's per second over more than a single core. I removed that and now that interface function merly collects the data and controls the "backend"

Op di 14 nov 2023 om 22:55 schreef srcejon @.***>:

That is because the DAB library was designed for just running a single service in a single channel.

I believe that is all that we are doing. What I think happens is the following:

  • When the user changes the frequency, dabReset() is called.
  • stop() in the dab library has a 1 second delay.
  • This is long enough, that the user could change the frequency again, while it is waiting, resulting in a second call to dabReset() getting queued, so it is called immediately after the first returns.

I guess I could add some code to prevent that - but calling dabReset() twice in succession didn't seem unreasonable.

— Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/97#issuecomment-1811407818, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCPHQGDTXGRBQFOIRPGKW3YEPSFXAVCNFSM6AAAAAA5UBINFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGQYDOOBRHA . You are receiving this because you commented.Message ID: @.***>

-- Jan van Katwijk