ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
https://rocm.docs.amd.com/projects/omnitrace/en/latest/
MIT License
297 stars 27 forks source link

PyTorch Python fork fix #291

Closed jrmadsen closed 1 year ago

jrmadsen commented 1 year ago

Test Cases

Follow basic setup steps in #284.

Note: on system used for testing (Lockhart) LD_PRELOAD=/usr/lib64/libstdc++.so.6 was required due to libstdc++.so.6 from conda env being too old for the ROCm libraries linked by omnitrace (omnitrace was built with -static-libstdcxx)

  1. Configure stemdlConfig.yaml with 2 GPUs and execute srun -G 2 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
  2. Configure stemdlConfig.yaml with 4 GPUs and execute srun -G 4 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
  3. Wrote run.sh and execute srun -G 2 ./run.sh

run.sh Contents

#!/bin/bash

set +e
pkill traced
pkill perfetto

set -e
traced --background
perfetto --out stemdl.proto --txt -c ./omni-perfetto.cfg --background

export OMNITRACE_PERFETTO_BACKEND=system
python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml

omni-perfetto.cfg Contents

Used by perfetto command in run.sh

duration_ms: 3000
write_into_file: true
file_write_period_ms: 3000
flush_period_ms: 3000

buffers {
  size_kb: 102400000
  fill_policy: RING_BUFFER
}

data_sources {
  config {
      name: "track_event"
  }
}

Additional Notes

Omnitrace had to be built from scratch with OMNITRACE_MAX_THREADS=4096 to complete at least one of the PyTorch runs because it created > 2048 threads (the default max threads in an installer release) and caused omnitrace to abort. However, this absolute restriction on the total number of threads created by a process will eventually be removed (hopefully soon).