PixarAnimationStudios / OpenUSD

Universal Scene Description
http://www.openusd.org
Other
6.18k stars 1.23k forks source link

TF_FATAL can cause a process to exit cleanly instead of terminate with an error code #3317

Open andrewkaufman opened 2 months ago

andrewkaufman commented 2 months ago

Description of Issue

Issuing a TF_FATAL should trigger an abort/terminate with a non-zero exit code and emit a crash report to stderr.

However, on posix systems, the mechanism uses Arch_DebuggerIsAttachedPosix to conditionally exit 0, without the crash report, presumably to avoid interfearing with the debugger. The current implementation is subject to ptrace permissions settings, and on Linux distros which default to "restricted ptrace", it can return false positives.

The result is that TF_FATAL causes an early terminate with no indication to the user of what happened or why. Worse, the exit code 0 indicates it was a successful process completion, so when used in e.g. a render farm or other cloud compute scenario, it will appear as a successfully completed task.

Steps to Reproduce

  1. Create a Dockerfile:
FROM ubuntu:22.04

RUN apt update && apt install -y python3.10 libpython3.10 pip

RUN pip install usd-core
  1. Build a container:
docker build -t fatal-no-op /path/to/dockerfile
  1. Run python and emit a Tf.Fatal:
> docker run -it fatal-no-op python3.10 -c 'import pxr.Tf; pxr.Tf.Fatal("abort")'; echo "Python Exit code: $?"
Python Exit code: 0

Steps to Fix

This is fixed by #3014, if you re-build the container using the updated USD builds from that MR, then the final step produces the expected result:

> docker run -it fatal-fixed python3.10 -c 'import pxr.Tf; pxr.Tf.Fatal("abort")'; echo "Python Exit code: $?"

---------------------------- python3.10 terminated -----------------------------
python3.10 crashed. FATAL ERROR: Python Fatal Error: abort
in __main__.<module> at line 1 of <string>
writing crash report to [ 0bc0d5b385fa:/var/tmp/st_python3.10.1 ] ... done.
--------------------------------------------------------------------------------
Python Exit code: 139

System Information (OS, Hardware)

Ubuntu 22.04 (or any Linux distro with restricted ptrace permissions)

jesschimein commented 2 months ago

Filed as internal issue #USD-10191