eliemichel / OpenMfxForBlender

A branch of Blender featuring an OpenMfx modifier
Other
176 stars 19 forks source link

Crash when connecting a geometry node to input normals #82

Closed tkarabela closed 1 year ago

tkarabela commented 1 year ago

After #81 was fixed, I've compiled a debug build from version adc0e195d31fc2b676959831c46d87ef7039ea65 and I'm able to create the OpenMfx node. Continuing with instructions from https://openmfx.org/QuickStart/using-openmfx.html#as-a-geometry-node, I select the "Explode" effect and connect input/output. However, when connecting the "normal" input:

Screenshot from 2022-11-13 20-36-14

I get SIGSEGV that happens inside the cornerEvaluator.evaluate(); call:

Screenshot from 2022-11-13 20-40-32

It's not obvious to me what the segmentation fault is about - from GDB it looks that it's dereferencing 0x10 (could be NULL + small offset?):

(gdb) p $_siginfo._sifields._sigfault.si_addr
$8 = (void *) 0x10

Parameters to the context.get_varray_for_input(field_input, mask, scope); look fine, though:

(gdb) p field_input
$5 = (const blender::bke::NormalFieldInput &) @0x7fffd7f2ad70: {<blender::bke::GeometryFieldInput> = {<blender::fn::FieldInput> = {<blender::fn::FieldNode> = {_vptr.FieldNode = 0x414ea50 <vtable for blender::bke::NormalFieldInput+16>, node_type_ = blender::fn::FieldNodeType::Input, field_inputs_ = std::shared_ptr<const blender::fn::FieldInputs> (use count 1, weak count 0) = {get() = 0x7fffe4cc9510}}, type_ = 0x474aee0 <blender::CPPType::get_impl<blender::vec_base<float, 3> >()::cpp_type>, debug_name_ = "", category_ = blender::fn::FieldInput::Category::Generated}, <No data fields>}, <No data fields>}

(gdb) p mask
$6 = {indices_ = {data_ = 0x7fffe497f008, size_ = 24}}

(gdb) p scope
$7 = (blender::ResourceScope &) @0x7fffffffab98: {<blender::NonCopyable> = {<No data fields>}, <blender::NonMovable> = {<No data fields>}, allocator_ = {<blender::NonCopyable> = {<No data fields>}, <blender::NonMovable> = {<No data fields>}, allocator_ = {<No data fields>}, owned_buffers_ = {begin_ = 0x7fffffffabc0, end_ = 0x7fffffffabc0, capacity_end_ = 0x7fffffffabe0, allocator_ = {<No data fields>}, inline_buffer_ = {...}, debug_size_ = 0}, unused_borrowed_buffers_ = {begin_ = 0x7fffffffac00, end_ = 0x7fffffffac00, capacity_end_ = 0x7fffffffac40, allocator_ = {<No data fields>}, inline_buffer_ = ..., debug_size_ = 0}}

Edit: It looks that the context is wrong and the segmentation fault is from vtable jump (?)

(gdb) p context
$9 = (const blender::fn::FieldContext &) @0x7fffffffa530: {_vptr.FieldContext = 0x0}
tkarabela commented 1 year ago

It looks like a lifetime issue - storing GeometryComponentFieldContext in variables in the node_geo_exec() function seems to fix the crash. I was also getting AddressSanitizer warnings about use-after-out-of-scope, that might have been the same issue.

Screenshot from 2022-11-13 21-00-44

eliemichel commented 1 year ago

Hard to debug since on Windows it runs fine. :/ The field manipulation is still not 100% clear to me, there seemd to be multiple ways of doing teh same thing at the time of writing this, and since then I merged 2 versions updates of Blender's master so some things might have changed without me keeping track of it.

Another example is the creation of nodes with a variable number of parameters: I had to introduce this new feature in the node system beyong using it for OpenMfx nodes, though it was on the course of being implemented independently on the master branch... Might trigger some surprises eventually ^^

PS: Did not receive your email, can you tell me the first 3 letters of the domain name as a hint?

tkarabela commented 1 year ago

I've sent you an e-mail to your domain exppad.com - if there's any trouble, just write me and I'll reply to you - my e-mail is in my profile.

Regarding this bug:

I'll try it with a newer GCC on Ubuntu 22.04. I thought I would try with GCC on Windows, too, but Blender only has precompiled dependencies for MSVC, so I won't bother for now ^^;

tkarabela commented 1 year ago

I've looked into it a bit more and what happens in the Mfx code is this:

  1. GeometryComponentFieldContext grabs a const reference to the GeometryComponent (the GeometryComponent is an l-value here, so this is fine)
  2. FieldEvaluator grabs a const reference to the GeometryComponentFieldContext (which is currently r-value, not given its own variable)
  3. Later when calling .evaluate() on the FieldEvaluator it wants to access the GeometryComponent reference through the GeometryComponentFieldContext reference, but the GeometryComponentFieldContext is out of scope at this point since it only exists as an r-value used when constructing the FieldEvaluator.

(If it was using pointers instead of references, I could see why dereferencing a pointer from a temporary struct would be problematic. With const references, I'm not really sure what the rules are...)

https://github.com/eliemichel/OpenMfxForBlender/blob/adc0e195d31fc2b676959831c46d87ef7039ea65/source/blender/nodes/geometry/nodes/node_geo_open_mfx.cc#L825-L829


I've grepped through the current source code and it looks that everywhere the "context" is given its own variable before passing it to the "evaluator", eg:

https://github.com/eliemichel/OpenMfxForBlender/blob/adc0e195d31fc2b676959831c46d87ef7039ea65/source/blender/modifiers/intern/MOD_nodes.cc#L1013-L1014

https://github.com/eliemichel/OpenMfxForBlender/blob/adc0e195d31fc2b676959831c46d87ef7039ea65/source/blender/nodes/geometry/nodes/node_geo_curve_subdivide.cc#L41-L44

...so I would suggest doing the same - it shouldn't hurt anything :)