darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.65k stars 1.13k forks source link

Use of lua lib in darkroom causes segfault #8221

Closed Mark-64 closed 3 years ago

Mark-64 commented 3 years ago

With the recent introduction of export lib in darkroom, I thought to modify my ext_editor.lua script so to have the possibility to do round trips to external programs with .tiff files while in darkroom, by using the gui (this is already possible with the current script through shortcuts).

So I modified the script this way, by adding the darkroom line

dt.register_lib(
  MODULE_NAME,          
  _("external editors"),  
  true, -- expandable
  false,  -- resetable
  {[dt.gui.views.lighttable] = {"DT_UI_CONTAINER_PANEL_RIGHT_CENTER", 100},
   [dt.gui.views.darkroom] = {"DT_UI_CONTAINER_PANEL_LEFT_CENTER", 100}
   },  
  dt.new_widget("box") {
    orientation = "vertical",
    combobox,
    box1
    },
  nil,  -- view_enter
  nil   -- view_leave
  )

In essence, the failing branch of the script creates a physical copy of the image, calls the external program on it and import the result in the database. While this works perfectly if called from lighttable, it creates a segfault in darkroom. Here is the backtrace. darktable_bt_DESMY0.txt

With a debugger, I can see that function _display_module_trouble_message_callback() is called with an unitialized value of module, which means DT_SIGNAL_TROUBLE_MESSAGE is raised incorrectly somewhere.

Perhaps importing pics in database is not allowed while in darkroom, so what I am doing is not permitted, or ? @wpferguson can you have a look ?

wpferguson commented 3 years ago

so what I am doing is not permitted, or ?

Lua is supposed to be used in lighttable only, with a couple of exceptions. You can draw a guide in the crop and rotate module, take a snapshot, query the image currently selected, and select another image. I'm not sure how the exporter being visible in darkroom mode is going to work with Lua. I was just getting ready to check when I saw this. I have a feeling it will be OK, because it's wrapped in the exporter.

The backtrace appears to show that darkroom couldn't access the module so perhaps adding a lib to darkroom doesn't work correctly. I'm guessing it was never tested since it isn't supposed to happen.

wpferguson commented 3 years ago

I tried this in Linux and it worked. I created a jpg from a raw, edited it in gimp, overwrote the image with the result, and it worked fine. Also edited a copy and it imported fine. Tried the shortcut and the lib.

Mark-64 commented 3 years ago

Yes, sometimes it works in Windows too, but many times crashes when doing "edit a copy" from the lib gui in darkroom. Never happens in lighttable. With the debugger, it is evident that at certain point _display_module_trouble_message_callback() is called with an unitialized value of module, 100% reproducible, but it doesn't always produces a crash and as always Linux is more forgiving in those cases. I will try under Linux too. If you want to test, you can use this simplified script that creates one lib in both lighttable and darkroom with only one buttom "copy and import", which the failing action. test.lua.txt

Mark-64 commented 3 years ago

Tried in Linux and there is no issue, pretty weird.

wpferguson commented 3 years ago

Here's the policy on lua

The general limitation of the lua API is the lighttable view. In other word we aim at being able to do with lua everything that can be done in the lighttable, but anything that can only be done in darkroom mode is off-limit

Mark-64 commented 3 years ago

Yes, I remember I read this already somewhere. In reality the action: <copying an image + importing the copy> is a typical lighttable action. The question is: why this fails if it is performed while being in darkroom and why only on Windows? Also it fails in a function which has been added recently if I am not mistaken, so it could be something trivial after all. I will have to investigate more.

wpferguson commented 3 years ago

I tried it on win7 with 3.4.0 and got no crashes. Did 10 copy and imports.

wpferguson commented 3 years ago

You could try 3.4.0, or 3.4.1.1 and see if it crashes. If not, do a git bisect to figure out what commit caused the problem.

Mark-64 commented 3 years ago

No problems until 3.4.1.1. It is something done later, I will try to bisect

Mark-64 commented 3 years ago

This issue starts as a consequence of #7966 which introduced modules trouble messages based on signals. I believe it is some sort of race condition between generation of filmstrip thumbnail (triggered by lua importing an image while in darkroom) and refresh of the center image. In fact, if I introduce a delay between the two in my lua script there is no more crash, see below

  -- refresh darkroom view
  if dt.gui.current_view() == dt.gui.views.darkroom then
    dt.control.sleep(1500)
    dt.gui.views.darkroom.display_image(new_image)
    end

That would explain the weird random behaviour and why it is different under Linux.

I guess this is a case unforeseen, we are a bit border line here. @TurboGit can I ask your opinion ?

wpferguson commented 3 years ago

@Mark-64 does the problem occur if the module isn't visible in darkroom view and you only use the shortcut?

Mark-64 commented 3 years ago

does the problem occur if the module isn't visible in darkroom view and you only use the shortcut?

Current script has shortcuts for the "edit" action and that is fine. However if I modify the script to associate a shortcut to "edit a copy" action, then yes, the problem occurs even if the module is not visible in darkroom. So it has nothing to do with the visibility of the module, it is related to importing a new image while in darkroom. So I remain of my idea.

Mark-64 commented 3 years ago

I might have found the solution here, but I need the opinion of an expert dev. _display_module_trouble_message_callback(), where the segfault occurs, is triggered by a signal raised by dt_iop_set_module_trouble_message(). In turns, that can be called by _display_wb_error() in temperature.c _check_for_wb_issue_and_set_trouble_message() in channelmixerrgb.c dt_iop_have_required_input_format() in imageop.c Now, the first function has the semaphores ++darktable.gui->reset and --darktable.gui->reset at the beginnind and at the end, while the other two don't have. Putting those, there is no more fault. Maybe doing this we avoid a asynch signals causing the segfault?

@wpferguson what do you think? @TurboGit you implemented the module trouble signals, I really appreciate your opinion here

wpferguson commented 3 years ago

@Mark-64 I had another thought (actually another test). Enable contrib/gimp, open an image in darkroom mode, then use the exporter and select "edit with GIMP", then click export. This does an export of the image, opens it in GIMP, then on close imports the image and groups it with the original, but it doesn't update the display. You could add a line, after the import, to have it do that. That way we would have a separate code path that triggers the bug.

I'll look at the code you pointed out above and see what I can figure out.

wpferguson commented 3 years ago

I think I might have a scenario, though I'm not sure why it doesn't happen in linux. The image you import is always an ldr (jpg, tif, i.e. non-raw). In lighttable when you import an image it's checked when generating the thumbnail and the image flags are set, which in this case identify the image as not raw so therefore no white balance will be attempted. When you import in darkroom and immediately display, the image (sometimes) hasn't been identified as a not raw, so darktable attempts to apply a white balance and then throws an error because it can't. And maybe it doesn't occur on linux because linux processes it just a bit faster, thus avoiding the problem.

EDIT: However, segfaulting when a trouble message occurs is a problem that still needs addressed. @Mark-64 I wonder if you took your test script from above and made it loop 5 or 10 times if it would cause the error on linux?

wpferguson commented 3 years ago

Now, the first function has the semaphores ++darktable.gui->reset and --darktable.gui->reset at the beginnind and at the end, while the other two don't have.

Why don't you put a print statement in those three functions to see which one is actually raising the trouble message? Run it until it crashes and see which one printed.

wpferguson commented 3 years ago

Putting those, there is no more fault. Maybe doing this we avoid a asynch signals causing the segfault?

Did you get:

Mark-64 commented 3 years ago

no error message and crash before refresh of the central image

wpferguson commented 3 years ago

If you wrapped the trouble message with the ++|-- darktable.gui->reset what happened?

I tried the loop in linux. Still no crash though I did get

[dt_ioppr_transform_image_colorspace_cl] invalid conversion from 2 to 0
[_transform_matrix] invalid conversion from 2 to 0
[dt_ioppr_transform_image_colorspace] invalid conversion from 2 to 0

each time I duplicated an image and imported.

Mark-64 commented 3 years ago

the fault happens (when it happens) in _display_module_trouble_message_callback(), with a signal raised by dt_iop_set_module_trouble_message(), called by dt_iop_have_required_input_format(), called by module "gamma". No need to put print, I see from debugger. Also dt_iop_have_required_input_format() is often executed by another thread than the gui main thread. Now, dt_iop_have_required_input_format() has a buffer copy inside dt_iop_copy_image_roi(). I suspect that this operation is long enough that, without semaphores, there is a significant chance to get another signal raised in the meantime

Mark-64 commented 3 years ago

If you wrapped the trouble message with the ++|-- darktable.gui->reset what happened?

I get no errors in the console. I can do a PR for testing with the modification, give me a few minutes

Mark-64 commented 3 years ago

I will do the other tests you suggested later

wpferguson commented 3 years ago

Are you running darktable from the build tree, or generating an installer version and installing it? If you run it under the mingw shell, lua behaves differently, since the shell is actually a bash shell and not a windows shell.

Mark-64 commented 3 years ago

I run within my Eclipse IDE, in order to have the debugger, but I get the crash also with my regular installed version

wpferguson commented 3 years ago

Here's the issues I see...

The gamma module (display encoding) is somehow getting bad data. You can cause it (sometimes) by importing an image and displaying it while in darkroom mode on win 10. We haven't been able to recreate on linux. I'll try building and recreating on win 7.

When the gamma module uses dt_iop_set_module_trouble_message() and the signal is raised the callback, _display_module_trouble_message_callback(), segfaults at darkroom.c, line 224. Are one of the tests undefined?

Mark-64 commented 3 years ago

I had another thought (actually another test). Enable contrib/gimp, open an image in darkroom mode, then use the exporter and select "edit with GIMP", then click export. This does an export of the image, opens it in GIMP, then on close imports the image and groups it with the original, but it doesn't update the display. You could add a line, after the import, to have it do that. That way we would have a separate code path that triggers the bug.

segfault also in this scenario (without PR)

wpferguson commented 3 years ago

I just did 29 copy and imports on the latest git on win7 with no crash. It looks like this is a Win10 "feature".

Mark-64 commented 3 years ago

Yes and unfortunately, to more extensive tests, the PR does not solve the issue :-(

TurboGit commented 3 years ago

Ok, so maybe a race condition indeed. It seems that you said this issue to be related on signals and also talking about Lua. Have you tried removing luarc completely? Can you still reproduce?

Sorry if you already tried, I may have missed a message.

TurboGit commented 3 years ago

Can you test setting plugins/darkroom/export/visible in darktablerc to false?

TurboGit commented 3 years ago

Can you test #8273 ?

Mark-64 commented 3 years ago

In my case I get the crash with both #8273 and plugins/darkroom/export/visible set to false. In my scenario I am already in darkroom and importing an image, which I do with a lua shortcut (which forces a filmstrip refresh), triggers the issue. It is intermittent but it's there. I can't reproduce like @chhil in #8262 by the way, but it is obviously the same thing. Shall we continue the discussione there?

Mark-64 commented 3 years ago

closed duplicate

chhil commented 3 years ago

@Mark-64 Not related to the issue. Do you have some eclipse setup guidelines on windows for debugging that you could share?

I had setup NetBeans long time ago but got rid of it as it just took too long to start and debug. Eclipse I am familiar with as its my primary ide for day job in java, so I any guidelines setting it up would be greatly appreciated.

Mark-64 commented 3 years ago

@chhil Sorry I have completely missed your message. Some time ago I wrote a guide for darktable in Eclipse. I will try to update it and share. It will take a day or two.

Mark-64 commented 3 years ago

@chhil see https://github.com/Mark-64/darktable-eclipse plase let me know if the guide works or if there is anything missing / unclear