alibaba / clusterdata

cluster data collected from production clusters in Alibaba for cluster management research
1.54k stars 403 forks source link

Memory bandwidth usage value (mem_gps) #50

Open tomxice opened 5 years ago

tomxice commented 5 years ago

Glad to see the new trace includes memory bandwidth usage information. I've checked several machine_usage entries and found non-empty values. I'm somehow confused with its description "Normalized to maximum memory bandwidth usage of all machines". What does this value exactly mean? For example, if the mem_gps is 5, does it mean 5% of the memory bandwidth of this machine is used? Or it just means the bandwidth is 5GB/s?

BTW, two minor concerns.

  1. This value is a float but is said to be an integer in 'trace_2018.md';
  2. The name 'mem_gps' for memory bandwidth, what does the 'gps' mean?
HaiyangDING commented 5 years ago

Hi, thanks for pointing the errors in the doc and I will update them.

Now to your questions.

  1. About the normalization of mem_gps. All values in mem_gps (aka memory bandwidth) is normalized to the maximum value of the mem_gps of all the machines of all time. This really does not reflect any absolute value. We choose not to include the absolute value due to some concerns, so please understand. The current figures could at least show the trend or the portion of the memory bandwidth of different subjects. To this end, 5 means this value is 5% as much as the maximum memory bandwidth of all machines over all time.

  2. mem_gps initially means "memory gigabyte per second", however since it is normalized, it is not 'gigabyte' anymore. I should agree that this is bad naming and I am sorry for the inconvenience.

tomxice commented 5 years ago

Thank you for the explanation. One more question. Is the 'maximum value of the mem_gps of all machines of all time' a constant value across all trace data? Or it is a variable itself? For the former case, everything is fine and I totally understand your concerns. For the latter case, could I get the total memory bandwidth from somewhere?

HaiyangDING commented 5 years ago

Hi, maximum value of the mem_gps is "a constant value across all trace data", everything should be fine. Let me know if you had any further concern/question :)

tomxice commented 5 years ago

Thank you so much, for both the trace and the rapid reply. 👍

Violet-Guo commented 5 years ago

Closing the issue. If I have missed something, please feel free to reopen this issue.