PyTorch Python fork fix

fixes issue where forking process in PyTorch causes omnitrace/__main__.py to fail due to missing script argument
closes #284

Test Cases

Follow basic setup steps in #284.

Note: on system used for testing (Lockhart) LD_PRELOAD=/usr/lib64/libstdc++.so.6 was required due to libstdc++.so.6 from conda env being too old for the ROCm libraries linked by omnitrace (omnitrace was built with -static-libstdcxx)

Configure stemdlConfig.yaml with 2 GPUs and execute srun -G 2 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
Configure stemdlConfig.yaml with 4 GPUs and execute srun -G 4 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
Wrote run.sh and execute srun -G 2 ./run.sh

`run.sh` Contents

#!/bin/bash

set +e
pkill traced
pkill perfetto

set -e
traced --background
perfetto --out stemdl.proto --txt -c ./omni-perfetto.cfg --background

export OMNITRACE_PERFETTO_BACKEND=system
python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml

`omni-perfetto.cfg` Contents

Used by perfetto command in run.sh

duration_ms: 3000
write_into_file: true
file_write_period_ms: 3000
flush_period_ms: 3000

buffers {
  size_kb: 102400000
  fill_policy: RING_BUFFER
}

data_sources {
  config {
      name: "track_event"
  }
}

Additional Notes

Omnitrace had to be built from scratch with OMNITRACE_MAX_THREADS=4096 to complete at least one of the PyTorch runs because it created > 2048 threads (the default max threads in an installer release) and caused omnitrace to abort. However, this absolute restriction on the total number of threads created by a process will eventually be removed (hopefully soon).

ROCm / omnitrace

PyTorch Python fork fix #291

Test Cases

`run.sh` Contents

`omni-perfetto.cfg` Contents

Additional Notes

ROCm / omnitrace

PyTorch Python fork fix #291

Test Cases

run.sh Contents

omni-perfetto.cfg Contents

Additional Notes

`run.sh` Contents

`omni-perfetto.cfg` Contents