Open zhdllwyc opened 1 month ago
We are looking at this issue. We will update soon. Thanks.
Can I get the URL for the profile result? If you can attach the NEFF as well, that would be very helpful.
The profile result is hosted on my instances, but here goes my NEFF file (I have to zip it because NEFF extension is not supported here).
MODULE_SyncTensorsGraph.40_10114637376880686083.zip
Here goes the script I use to profile (%1 is python script to execute, %2 is number of worker to profile):
#!/bin/bash
# Check if file is provided as an argument
if [ -z "$1" ]; then
echo "Please provide a file."
exit 1
fi
# Check if the provided argument is a file
if [ ! -f "$1" ]; then
echo "The provided argument is not a file."
exit 1
fi
current_datetime=$(TZ="America/Los_Angeles" date +"%Y-%m-%d-%H:%M:%S")
filename="${1%.py}"
DIR="${filename}_${current_datetime}"
rm -rf /tmp/ubuntu/neuroncc_compile_workdir/*
rm -rf /var/tmp/neuron-compile-cache/neuronxcc-*/*
rm -rf "$DIR"
mkdir "$DIR"
python $1
mv MODULE_* "$DIR"
cd "$DIR"
# Find the first file with the .neff extension in the current directory
file=$(find . -maxdepth 1 -type f -name "*.neff" | head -n 1)
neuron-profile capture -n "$file" -s profile.ntff --collectives-workers-per-node $2 --profile-nth-exec=2
mkdir profile_result
mv profile_*exec* profile_result/
mkdir profile_result_json
for ntff_file in profile_result/*; do
echo "$ntff_file"
rank_integer=$(echo "$ntff_file" | grep -oP '(?<=_rank_)[0-9]+')
echo "$rank_integer"
neuron-profile view --output-format json --output-file "./profile_result_json/profile_${rank_integer}_${current_datetime}.json" -n "$file" -s "${ntff_file}"
done
neuron-profile view -n "$file" -d profile_result --db-bucket="${current_datetime}"
cd ..
Here goes my NTFF file: profile_result.zip
I am launching nccl.collective_permute on a trn1.32xlarge. Within the workload, each neuron core sends data to neighboring worker following a pre-specified topology. However, some of the workers experience extremely long duration (0.2 ms) whereas most of the workers has a duration of 0.014 ms.
Below is a screen shot of the profiling result of worker 1 (0.014 ms duration).
Below is the screen shot of the profiling result of worker 0 (abnormal 0.2 ms duration):
The source code is:
My pip freeze is:
My neuron-profile version is:
When profiling, I output the profile result of the second iteration: