Validating and improving how profiling is done. Internal Ipex profiling example works in isolation, but with this Megatron-DeepSpeed we are missing the XPU outputs.
[ ] Timeline output with XPU, including kernels and communication
[ ] Compare support and output of the PyTorch legacy profiler
[ ] Check whether IPEX_ZE_TRACING=1 is needed and add that to relevant scripts
[ ] Generate smaller profiler output files by leveraging torch.profiler.schedule and profiler.step(). For example wait 2 steps, warmup for 2 steps, and then profile 2 steps and write the output.
Validating and improving how profiling is done. Internal Ipex profiling example works in isolation, but with this Megatron-DeepSpeed we are missing the XPU outputs.
IPEX_ZE_TRACING=1
is needed and add that to relevant scriptstorch.profiler.schedule
andprofiler.step()
. For example wait 2 steps, warmup for 2 steps, and then profile 2 steps and write the output.