fixes issue where forking process in PyTorch causes omnitrace/__main__.py to fail due to missing script argument
closes #284
Test Cases
Follow basic setup steps in #284.
Note: on system used for testing (Lockhart) LD_PRELOAD=/usr/lib64/libstdc++.so.6 was required due to libstdc++.so.6 from conda env being too old for the ROCm libraries linked by omnitrace (omnitrace was built with -static-libstdcxx)
Configure stemdlConfig.yaml with 2 GPUs and execute srun -G 2 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
Configure stemdlConfig.yaml with 4 GPUs and execute srun -G 4 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
Omnitrace had to be built from scratch with OMNITRACE_MAX_THREADS=4096 to complete at least one of the PyTorch runs because it created > 2048 threads (the default max threads in an installer release) and caused omnitrace to abort. However, this absolute restriction on the total number of threads created by a process will eventually be removed (hopefully soon).
omnitrace/__main__.py
to fail due to missing script argumentTest Cases
Follow basic setup steps in #284.
srun -G 2 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
srun -G 4 python -m omnitrace -- ./stemdl_classification.py --config ./stemdlConfig.yaml
run.sh
and executesrun -G 2 ./run.sh
run.sh
Contentsomni-perfetto.cfg
ContentsAdditional Notes
Omnitrace had to be built from scratch with
OMNITRACE_MAX_THREADS=4096
to complete at least one of the PyTorch runs because it created > 2048 threads (the default max threads in an installer release) and caused omnitrace to abort. However, this absolute restriction on the total number of threads created by a process will eventually be removed (hopefully soon).