GafferHQ / gaffer

Gaffer is a node-based application for lookdev, lighting and automation
http://www.gafferhq.org
BSD 3-Clause "New" or "Revised" License
969 stars 207 forks source link

UI Freezing Randomly - RHEL 9.4 #5877

Closed A6i8 closed 5 months ago

A6i8 commented 6 months ago

Version: Gaffer 1.4.3.0-linux-gcc9 Third-party tools: Arnold Third-party modules: None

Linux version: 5.14.0-427.16.1.el9_4.x86_64 mockbuild@iad1-prod-build001.bld.equ.rockylinux.org](mailto:mockbuild@iad1-prod-build001.bld.equ.rockylinux.org)) (gcc (GCC) ldd (GNU libc) : 2.34

Description

UI freezing randomly e.g. (Selecting node, while changing layout, while selecting Catalogue )

Nothing else is running in background.

I also enableIECORE_LOG_LEVEL: "DEBUG" but nothing related to UI or back trace.

Can you please help me how i can get logs and debug the problem.

Thanks.

johnhaddon commented 6 months ago

Can you please help me how i can get logs and debug the problem.

First, determine the process ID of the Gaffer process, by typing ps -ef | grep gaffer in a terminal. Then run eu-stack -p <PID>, where PID is the process ID from the first command. This will print out a stack trace from every Gaffer thread, which is typically very useful for diagnosing hangs. If you could attach that output to this issue that would be very helpful.

Here's an example running those commands on my system :

image

A6i8 commented 6 months ago

Hi johnhaddon,

Thanks for your quick response.

I also check with Gaffer 1.4.5 same thing happening.

Please find the attached error logs. Gaffer_1.4.3.error.log Gaffer_1.4.5.error.log

Thanks

johnhaddon commented 6 months ago

Oof, this one is nasty. Thanks for the logs - they makes things pretty clear.

What's not clear is why this is happening for you repeatedly but not for anyone else yet. In theory it could definitely happen to anyone, but it seems to require that a Python-derived Node be destroyed on a background thread due to garbage collection, and at a very inconvenient time. Even when deleted, most nodes are still owned by the UI thread's undo queue so are unlikely to be disposed of in this way. I wonder if you have any custom code at all, and if any of that might make this more likely?

A6i8 commented 6 months ago

Hello John,

Thank you for your quick response. yes we have added a few Python expressions for automation, it's difficult to figure out which node might cause the UI freeze, we are currently looking into it, i have also attached the file for your reference, Please add a Geo, Shader and a HDR in the lights for the file to work. In case if you find anything i would love to hear your thoughts.

Thankyou!

Bear_SHD_v001_t01.zip

johnhaddon commented 6 months ago

Thanks for the file - we'll see if we can reproduce the problem here. Quick note though : I'm about to go on holiday for a few days, so won't get a chance until at least next Tuesday.

As a short term workaround, I'd be curious to know if running this helps reduce the frequency of the problem :

IECore.RefCounted.garbageCollectionThreshold = 10000

You could either do that in the PythonEditor or in a ~/gaffer/startup/gui/foo.py file.

A6i8 commented 6 months ago

IECore.RefCounted.garbageCollectionThreshold = 10000 that helps

johnhaddon commented 5 months ago

UI freezing randomly e.g. (Selecting node, while changing layout, while selecting Catalogue )

Question : has this ever happened without changing the layout at some point beforehand (event if the freeze occurs when doing something else later)? I'm trying to figure out what might account for the stacktrace, and my main suspects at the moment are some internal nodes in some of the UI. But unless you've either changed the layout or removed something from it, I think I might be looking in the wrong place.

johnhaddon commented 5 months ago

I've managed to reproduce this quite simply now :

  1. Get a Catalogue with a bunch of images in it.
  2. Remove the ImageInspector from the layout.
  3. Select and deselect the Catalogue a few times.
johnhaddon commented 5 months ago

I believe this is fixed by #5893. Test builds for that should be available here shortly : https://github.com/GafferHQ/gaffer/actions/runs/9416506352. It would be great to know if they work for you @A6i8 (without the garbageCollectionThreshold = 10000 workaround in place).