darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.86k stars 1.15k forks source link

crash on exit in debug mode #17876

Open zisoft opened 2 days ago

zisoft commented 2 days ago

Describe the bug

When built with Debug mode, dt crashes reproducible on exit in line 195 of src/common/dtpthread.h:

https://github.com/darktable-org/darktable/blob/e079290f3737e1e6d29b79330670ce460c4a9dbf/src/common/dtpthread.h#L193-L198

Steps to reproduce

Expected behavior

No response

Logfile | Screenshot | Screencast

Assertion failed: (!ret), functiAssertion failed: (!ret), functiAssertion failed: (!ret), function dt_pthread_mutex_unlock_with_Assertion failed: (!ret), function dt_pthread_mutex_unlock_with_caller, file dtpthread.h, line 1caller, file dtpthread.h, line 1Assertion failed: (!ret), functi95.
on dt_pthread_mutex_unlock_with_Assertion failed: (!ret), functi95.
on dt_pthread_mutex_unlock_with_caller, file dtpthread.h, line 1Assertion failed: (!ret), function dt_pthread_mutex_unlock_with_95.
caller, file dtpthread.h, line 1on dt_pthread_mutex_unlock_with_Assertion failed: (!ret), functicaller, file dtpthread.h, line 195.
95.
on dt_pthread_mutex_unlock_with_caller, file dtpthread.h, line 1caller, file dtpthread.h, line 195.
on dt_pthread_mutex_unlock_with_caller, file dtpthread.h, line 195.

Commit

No response

Where did you obtain darktable from?

self compiled

darktable version

current master

What OS are you using?

Mac

What is the version of your OS?

macOS 15.1.1

Describe your system?

No response

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

No response

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

No response

zisoft commented 2 days ago

@ralfbrown: First bad commit is

commit 898757fd0cfd71810740cb69e1fb28cf42454bf6
Author: Ralf Brown <ralfbrown@users.noreply.github.com>
Date:   Sat Nov 16 07:58:53 2024 -0500

    fix early-exit crash in dt_control_quit and dt_control_shutdown

    Add synchronization variables to avoid an early user-requested exit
    causing a crash from trying to cleanup threads etc. which haven't
    actually been initialized yet.  Also avoid calling gtk_main() if we've
    already started shutdown by the time that point is reached, as that
    would cause a hang with no window displayed.
zisoft commented 1 day ago

I tracked down and found the reason for the crash in src/control/control.c:

https://github.com/darktable-org/darktable/blob/2702ef6efa8e8037dd3eb8f786d679a119e08660/src/control/control.c#L303-L332

If I comment out 319+320 the crash is gone.

@ralfbrown: Not sure what was your intention here. It shouldn't hurt to wait for the threads to finish?

ralfbrown commented 1 day ago

But we were getting a hang because the threads had never been created if the user managed to request a program exit fast enough.....

Which of the join calls is actually causing the assertion failure? Do you still get the issue if you move line 323 above 319/320?

zisoft commented 1 day ago

Which of the join calls is actually causing the assertion failure?

The assertion failure is triggered when none of the join calls is made due to the early return above.

Do you still get the issue if you move line 323 above 319/320?

That does not help, but when I move the first loop above, the crash is gone. So the code now looks as follows:

void dt_control_shutdown(dt_control_t *s)
{
  if(!s)
    return;
  dt_pthread_mutex_lock(&s->cond_mutex);
  dt_pthread_mutex_lock(&s->run_mutex);
  const gboolean was_running = s->running;
  s->running = FALSE;
  dt_pthread_mutex_unlock(&s->run_mutex);
  dt_pthread_mutex_unlock(&s->cond_mutex);
  pthread_cond_broadcast(&s->cond);

  /* first wait for gphoto device updater */
#ifdef HAVE_GPHOTO2
  pthread_join(s->update_gphoto_thread, NULL);
#endif

  /* then wait for kick_on_workers_thread */
  pthread_join(s->kick_on_workers_thread, NULL);

  for(int k = 0; k < s->num_threads; k++)
    // pthread_kill(s->thread[k], 9);
    pthread_join(s->thread[k], NULL);

  if(!was_running)
    return;     // if not running, there are no threads to join

  for(int k = 0; k < DT_CTL_WORKER_RESERVED; k++)
    // pthread_kill(s->thread_res[k], 9);
    pthread_join(s->thread_res[k], NULL);
}
ralfbrown commented 19 hours ago

So now the question is whether it is possible with this change to trigger a crash or hang by typing Ctrl-Q or clicking on the main window's close button in the interval between the window appearing and it fully drawing. You can extend the interval during which you could click by disabling the splash screen (Ctrl-Q isn't affected, but the splash screen hides the initial appearance of the main window).

zisoft commented 8 hours ago

I am not able to trigger this hang at all. Even not if the early return is commented out.

  // if(!was_running)
  //   return;      // if not running, there are no threads to join

So this change works ok on my Mac.

Just another hint which I found during my analysis: In src/common/darktable.c in function dt_cleanup() we have this code:

https://github.com/darktable-org/darktable/blob/584d614728692ea64627000b319cea2272f36b06/src/common/darktable.c#L2039-L2059

If I put a breakpoint at line 2047 and quit darktable, the breakpoint gets hit. Clicking continue then quits darktable without the crash. Setting the breakpoint at line 2050 makes the crash back again. So we seem to have a race condition here. Delaying the execution of dt_imageio_cleanup(darktable.imageio); avoids the crash.

And indeed, putting a g_usleep(1000000); in front of it avoids the crash as well:

  if(init_gui)
  {
    g_usleep(1000000);
    dt_imageio_cleanup(darktable.imageio);
    free(darktable.imageio);
    darktable.imageio = NULL;
    dt_control_shutdown(darktable.control);
    dt_control_cleanup(darktable.control);
    free(darktable.control);
    darktable.control = NULL;
    dt_undo_cleanup(darktable.undo);
    darktable.undo = NULL;
    free(darktable.gui);
    darktable.gui = NULL;
  }

This is of course not a solution but maybe it helps to understand whats going on.