GafferHQ / gaffer

Gaffer is a node-based application for lookdev, lighting and automation
http://www.gafferhq.org
BSD 3-Clause "New" or "Revised" License
960 stars 206 forks source link

Sporadic Hang With TaskCollaboration Plugs With Many Threads #5403

Closed danieldresser-ie closed 1 year ago

danieldresser-ie commented 1 year ago

Summarizing this investigation which turns out to not be very high priority because it only occurs if the "-threads" argument is used to run Gaffer with more threads than cores.

The theory is that TaskMutex uses mechanisms like executionState->arena.execute( [&executionState]{ executionState->taskGroup.wait(); } ); for threads to join an arena - we expect this to join the arena, and immediately execute the taskGroup.wait. However, if the arena is already considered "full", then this instead enqueues the "taskGroup.wait", which means that threads already in the arena could pick up this wait, causing a deadlock.

I will include below a simple repro.

A guard that I tried out which did not prevent the deadlock, but did strongly indicate that something weird is happening with high thread counts, is putting this in TaskMutex::acquireOr:

                    tbb::enumerable_thread_specific<int> badThreadFlag;

                    badThreadFlag.local() = 37;
                    executionState->arena.execute(
                        [&executionState, &badThreadFlag]{
                            executionState->taskGroup.wait();
                            if( badThreadFlag.local() != 37 )
                            {
                                throw IECore::Exception( "Run from wrong thread?" );
                            }
                            badThreadFlag.local() = 0;
                        }
                    );

This is triggered when running Gaffer with -threads set higher than the core count, indicating that there is the possibility of this wait running at a time we don't expect ( though this guard by itself does not prevent hangs, indicating that something else weird may be going on at high thread counts ).

John spotted this weird work-around indicates TBB folks being aware of the issue internally: https://github.com/oneapi-src/oneTBB/blob/v2020.3/src/tbb/arena.cpp#L1031-L1048 ( If they added the try_execute method mentioned in that comment, that would probably solve the issue for us ).

We haven't figured out why that workaround appears to no longer be needed in bleeding edge TBB, nor can we find the forum post that John mentions in TaskMutex.h as being the original inspiration for this approach.

As far as we can tell, this issue currently only occurs when using -threads to run with more threads than cores ( if we don't over-subscribe at all, then we never run out of max_concurrency in the arena, and never trigger this behaviour from tbb ), so this may not be too high priority.

Maybe tbb will eventually add try_execute, or maybe on new tbb we'll be able to just replace TaskMutex with collaborative_call_once?

danieldresser-ie commented 1 year ago

Here is the contents of my test file, deleteFacesHang3.gfr:

import Gaffer
import GafferImage
import GafferOSL
import GafferScene
import IECore
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 2, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 10, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

parent["variables"].addChild( Gaffer.NameValuePlug( "project:name", Gaffer.StringPlug( "value", defaultValue = 'default', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "projectName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "project:rootDirectory", Gaffer.StringPlug( "value", defaultValue = '$HOME/gaffer/projects/${project:name}', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "projectRootDirectory", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "image:viewName", Gaffer.StringPlug( "value", defaultValue = 'default', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultViewName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "workingPath", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "workingPath", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "job", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "job", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "sequence", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "sequence", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "shot", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "shot", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "scene", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "scene", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultResolution", Gaffer.V2iPlug( "value", defaultValue = imath.V2i( 1920, 1080 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultResolution", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultPixelAspectRatio", Gaffer.FloatPlug( "value", defaultValue = 1.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultPixelAspectRatio", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "defaultOverscan", Gaffer.V2fPlug( "value", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "defaultOverscan", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "image:catalogue:port", Gaffer.IntPlug( "value", defaultValue = 0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "imageCataloguePort", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "renderPassVariation", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "renderPassVariation", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
parent["variables"].addChild( Gaffer.NameValuePlug( "renderPassName", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "renderPassName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["defaultFormat"] = GafferImage.FormatPlug( "defaultFormat", defaultValue = GafferImage.Format( 1920, 1080, 1.000 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, )
parent.addChild( __children["defaultFormat"] )
__children["DeleteFaces1"] = GafferScene.DeleteFaces( "DeleteFaces1" )
parent.addChild( __children["DeleteFaces1"] )
__children["DeleteFaces1"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces2"] = GafferScene.DeleteFaces( "DeleteFaces2" )
parent.addChild( __children["DeleteFaces2"] )
__children["DeleteFaces2"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces5"] = GafferScene.DeleteFaces( "DeleteFaces5" )
parent.addChild( __children["DeleteFaces5"] )
__children["DeleteFaces5"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces6"] = GafferScene.DeleteFaces( "DeleteFaces6" )
parent.addChild( __children["DeleteFaces6"] )
__children["DeleteFaces6"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Filter"] = GafferScene.PathFilter( "Filter" )
parent.addChild( __children["Filter"] )
__children["Filter"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces7"] = GafferScene.DeleteFaces( "DeleteFaces7" )
parent.addChild( __children["DeleteFaces7"] )
__children["DeleteFaces7"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces8"] = GafferScene.DeleteFaces( "DeleteFaces8" )
parent.addChild( __children["DeleteFaces8"] )
__children["DeleteFaces8"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces9"] = GafferScene.DeleteFaces( "DeleteFaces9" )
parent.addChild( __children["DeleteFaces9"] )
__children["DeleteFaces9"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["DeleteFaces10"] = GafferScene.DeleteFaces( "DeleteFaces10" )
parent.addChild( __children["DeleteFaces10"] )
__children["DeleteFaces10"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLObject"] = GafferOSL.OSLObject( "OSLObject" )
parent.addChild( __children["OSLObject"] )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable5", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable6", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable7", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, ), True, "primitiveVariable8", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable9", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat2", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable10", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat3", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable11", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"]["primitiveVariables"].addChild( Gaffer.NameValuePlug( "customFloat4", Gaffer.FloatPlug( "value", defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "primitiveVariable12", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["OSLObject"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"] = GafferOSL.OSLCode( "OSLCode" )
parent.addChild( __children["OSLCode"] )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output1", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output2", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output3", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output4", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output5", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output6", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output7", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"]["out"].addChild( Gaffer.FloatPlug( "output8", direction = Gaffer.Plug.Direction.Out, defaultValue = 0.0, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["OSLCode"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Sphere"] = GafferScene.Sphere( "Sphere" )
parent.addChild( __children["Sphere"] )
__children["Sphere"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["CollectScenes"] = GafferScene.CollectScenes( "CollectScenes" )
parent.addChild( __children["CollectScenes"] )
__children["CollectScenes"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression"] = Gaffer.Expression( "Expression" )
parent.addChild( __children["Expression"] )
__children["Expression"]["__out"].addChild( Gaffer.StringVectorDataPlug( "p0", direction = Gaffer.Plug.Direction.Out, defaultValue = IECore.StringVectorData( [  ] ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression1"] = Gaffer.Expression( "Expression1" )
parent.addChild( __children["Expression1"] )
__children["Expression1"]["__out"].addChild( Gaffer.V2iPlug( "p0", direction = Gaffer.Plug.Direction.Out, defaultValue = imath.V2i( 20, 40 ), minValue = imath.V2i( 3, 6 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Expression1"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"] = GafferScene.Group( "Group" )
parent.addChild( __children["Group"] )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in2", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in3", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in4", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in5", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in6", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in7", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"]["in"].addChild( GafferScene.ScenePlug( "in8", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Group"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
parent["frameRange"]["start"].setValue( 1014 )
parent["frameRange"]["end"].setValue( 1081 )
parent["frame"].setValue( 1081.0 )
parent["variables"]["shot"]["value"].setValue( 'generic' )
parent["variables"]["scene"]["value"].setValue( '${script:name}' )
parent["variables"]["defaultResolution"]["value"].setValue( imath.V2i( 4096, 2534 ) )
parent["variables"]["defaultOverscan"]["value"].setValue( imath.V2f( 0, 0.000789265963 ) )
Gaffer.Metadata.registerValue( parent["variables"]["projectName"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["projectRootDirectory"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"], 'layout:visibilityActivator', False )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"]["name"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["defaultViewName"]["value"], 'readOnly', True )
Gaffer.Metadata.registerValue( parent["variables"]["imageCataloguePort"], 'readOnly', True )
__children["defaultFormat"].setValue( GafferImage.Format( 4096, 2534, 1.000 ) )
__children["DeleteFaces1"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces1"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces1"]["adjustBounds"].setValue( False )
__children["DeleteFaces1"]["faces"].setValue( 'varA' )
__children["DeleteFaces1"]["invert"].setValue( True )
__children["DeleteFaces1"]["__uiPosition"].setValue( imath.V2f( 131.852158, -25.2778969 ) )
__children["DeleteFaces2"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces2"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces2"]["adjustBounds"].setValue( False )
__children["DeleteFaces2"]["faces"].setValue( 'varB' )
__children["DeleteFaces2"]["invert"].setValue( True )
__children["DeleteFaces2"]["__uiPosition"].setValue( imath.V2f( 144.727859, -25.2778969 ) )
__children["DeleteFaces5"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces5"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces5"]["adjustBounds"].setValue( False )
__children["DeleteFaces5"]["faces"].setValue( 'varC' )
__children["DeleteFaces5"]["invert"].setValue( True )
__children["DeleteFaces5"]["__uiPosition"].setValue( imath.V2f( 155.879608, -25.2779045 ) )
__children["DeleteFaces6"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces6"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces6"]["adjustBounds"].setValue( False )
__children["DeleteFaces6"]["faces"].setValue( 'varD' )
__children["DeleteFaces6"]["invert"].setValue( True )
__children["DeleteFaces6"]["__uiPosition"].setValue( imath.V2f( 166.630936, -25.2779045 ) )
__children["Filter"]["paths"].setValue( IECore.StringVectorData( [ '/*' ] ) )
__children["Filter"]["__uiPosition"].setValue( imath.V2f( 197.610046, 5.44718122 ) )
__children["DeleteFaces7"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces7"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces7"]["adjustBounds"].setValue( False )
__children["DeleteFaces7"]["faces"].setValue( 'varE' )
__children["DeleteFaces7"]["invert"].setValue( True )
__children["DeleteFaces7"]["__uiPosition"].setValue( imath.V2f( 176.607651, -25.2779045 ) )
__children["DeleteFaces8"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces8"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces8"]["adjustBounds"].setValue( False )
__children["DeleteFaces8"]["faces"].setValue( 'varF' )
__children["DeleteFaces8"]["invert"].setValue( True )
__children["DeleteFaces8"]["__uiPosition"].setValue( imath.V2f( 187.350082, -25.2779045 ) )
__children["DeleteFaces9"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces9"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces9"]["adjustBounds"].setValue( False )
__children["DeleteFaces9"]["faces"].setValue( 'varG' )
__children["DeleteFaces9"]["invert"].setValue( True )
__children["DeleteFaces9"]["__uiPosition"].setValue( imath.V2f( 197.515213, -25.2779045 ) )
__children["DeleteFaces10"]["in"].setInput( __children["OSLObject"]["out"] )
__children["DeleteFaces10"]["filter"].setInput( __children["Filter"]["out"] )
__children["DeleteFaces10"]["adjustBounds"].setValue( False )
__children["DeleteFaces10"]["faces"].setValue( 'varH' )
__children["DeleteFaces10"]["invert"].setValue( True )
__children["DeleteFaces10"]["__uiPosition"].setValue( imath.V2f( 210.566727, -25.2779121 ) )
__children["OSLObject"]["in"].setInput( __children["CollectScenes"]["out"] )
__children["OSLObject"]["filter"].setInput( __children["Filter"]["out"] )
__children["OSLObject"]["interpolation"].setValue( 2 )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable5"]["name"].setValue( 'varA' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable5"]["value"].setInput( __children["OSLCode"]["out"]["output1"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable6"]["name"].setValue( 'varB' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable6"]["value"].setInput( __children["OSLCode"]["out"]["output2"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable7"]["name"].setValue( 'varC' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable7"]["value"].setInput( __children["OSLCode"]["out"]["output3"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable8"]["name"].setValue( 'varD' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable8"]["value"].setInput( __children["OSLCode"]["out"]["output4"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable9"]["name"].setValue( 'varE' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable9"]["value"].setInput( __children["OSLCode"]["out"]["output5"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable10"]["name"].setValue( 'varF' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable10"]["value"].setInput( __children["OSLCode"]["out"]["output6"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable11"]["name"].setValue( 'varG' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable11"]["value"].setInput( __children["OSLCode"]["out"]["output7"] )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable12"]["name"].setValue( 'varH' )
__children["OSLObject"]["primitiveVariables"]["primitiveVariable12"]["value"].setInput( __children["OSLCode"]["out"]["output8"] )
__children["OSLObject"]["__uiPosition"].setValue( imath.V2f( 155.879608, -2.04510856 ) )
__children["OSLCode"]["code"].setValue( 'int q = int( u * 10000 ) % 512;\n\noutput1 = 0;\noutput2 = 0;\noutput3 = 0;\noutput4 = 0;\noutput5 = 0;\noutput6 = 0;\noutput7 = 0;\noutput8 = 0;\n\nif( q < 1 ) output1 = 1;\nelse if( q < 2 ) output2 = 1;\nelse if( q < 4 ) output3 = 1;\nelse if( q < 8 ) output4 = 1;\nelse if( q < 16 ) output5 = 1;\nelse if( q < 32 ) output6 = 1;\nelse if( q < 64 ) output7 = 1;\nelse if( q < 128 ) output8 = 1;\n' )
__children["OSLCode"]["__uiPosition"].setValue( imath.V2f( 139.589676, -1.44510877 ) )
__children["Sphere"]["divisions"].setInput( __children["Expression1"]["__out"]["p0"] )
__children["Sphere"]["__uiPosition"].setValue( imath.V2f( 155.879761, 18.2009869 ) )
__children["CollectScenes"]["in"].setInput( __children["Sphere"]["out"] )
__children["CollectScenes"]["rootNames"].setInput( __children["Expression"]["__out"]["p0"] )
__children["CollectScenes"]["rootNameVariable"].setValue( 'collect:test' )
__children["CollectScenes"]["sourceRoot"].setValue( '/sphere' )
__children["CollectScenes"]["__uiPosition"].setValue( imath.V2f( 155.879608, 10.0369234 ) )
__children["Expression"]["__uiPosition"].setValue( imath.V2f( 145.378922, 10.0377245 ) )
__children["Expression1"]["__uiPosition"].setValue( imath.V2f( 145.379059, 18.2018032 ) )
__children["Group"]["in"][0].setInput( __children["DeleteFaces1"]["out"] )
__children["Group"]["in"][1].setInput( __children["DeleteFaces2"]["out"] )
__children["Group"]["in"][2].setInput( __children["DeleteFaces5"]["out"] )
__children["Group"]["in"][3].setInput( __children["DeleteFaces6"]["out"] )
__children["Group"]["in"][4].setInput( __children["DeleteFaces7"]["out"] )
__children["Group"]["in"][5].setInput( __children["DeleteFaces8"]["out"] )
__children["Group"]["in"][6].setInput( __children["DeleteFaces9"]["out"] )
__children["Group"]["in"][7].setInput( __children["DeleteFaces10"]["out"] )
__children["Group"]["__uiPosition"].setValue( imath.V2f( 166.63092, -42.2246513 ) )
__children["Expression"]["__engine"].setValue( 'python' )
__children["Expression"]["__expression"].setValue( 'parent["__out"]["p0"] = IECore.StringVectorData( [ "test%i_C_REN" % i for i in range( 100 ) ] )' )
__children["Expression1"]["__engine"].setValue( 'python' )
__children["Expression1"]["__expression"].setValue( 'import imath\nn = context.get( "collect:test" )\n\ni = int( n[4:-6] )\n\n\nparent["__out"]["p0"] = imath.V2i( 3, 6 ) * i' )

del __children

I've been running it like this:

repeat 40 timeout -s KILL 30s gaffer stats deleteFacesHang3.gfr -nodeSummary 0 -threads 50 -scene Group.out

This should produce a usual runtime of about 1 second scene generation ( plus several seconds loading ), but in the run of 40, you should see some take over 30 seconds and get killed by the timeout command.

The stack trace shows that one thread gets into the parallel for within ShadingEngine::shade, which should only be running OSL stuff, but then somehow that hits a mutex and deadlocks ... our understanding is that this is because another thread has tried to help with the parallel_for, but is over max_concurrency, so has instead enqueued a task which gets picked up while running the parallel_for.

johnhaddon commented 1 year ago

Good hunting Daniel! I think there a couple of relatively easy ways to resolve this :

  1. Put a limit on the -threads argument so you can't oversubscribe.
  2. Construct the task arena with an explicit argument of max_concurrency = global_control::active_value( max_allowed_parallelism ) to match the true thread count.

I'm fairly inclined to go with option 1 for its simplicity, and because I can't think of a good reason to oversubscribe.