bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.63k stars 488 forks source link

Bad Profile data #167

Open xiongji opened 4 years ago

xiongji commented 4 years ago

Describe the bug A clear and concise description of what the bug is.

I use BYTEPS_SERVER_KEY_TO_PROFILE to output an server_profile.json file. But when load it using chrome, bugs like this is alert: image

It's hard to location where the data is wrong.

To Reproduce Steps to reproduce the behavior:

  1. export BYTEPS_SERVER_KEY_TO_PROFILE=1

  2. See error

Expected behavior A clear and concise description of what you expected to happen.

Is the profile tool is available? I really cannot figure out where I was wrong. Thanks

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

ymjiang commented 4 years ago

Would you make sure if there is a tensor with key=1? You may open the json file, find one of the key (pid) and set BYTEPS_SERVER_KEY_TO_PROFILE to that value

xiongji commented 4 years ago

Would you make sure if there is a tensor with key=1? You may open the json file, find one of the key (pid) and set BYTEPS_SERVER_KEY_TO_PROFILE to that value

sorry, I paste wrong.

# profile
export BYTEPS_SERVER_ENABLE_PROFILE=1
export BYTEPS_SERVER_PROFILE_OUTPUT_PATH=/home/deploy/workbench/server/server_profile.json
export BYTEPS_SERVER_KEY_TO_PROFILE=131072
ymjiang commented 4 years ago

It might be something to do with timestamp mismatch. Would you upload the file here and let us check? If you sample a key, I think the json file should not be large.

eric-haibin-lin commented 4 years ago

I experienced it several times and it was due to truncated json file. The file was not properly flushed at the end of the program. But you can try to manually remove the last incomplete entry in the json file and visualize the rest.