bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.63k stars 488 forks source link

Mistakes of Workload calculation #441

Open fly-dragon211 opened 1 year ago

fly-dragon211 commented 1 year ago

The calculation of $Mss{GPU}$ and $Mss{CPU}$ may have some mistakes. The following is my calculation procedure:

image

while in the paper: image

shadow150519 commented 1 year ago

i think they do make the mistake :(

shadow150519 commented 1 year ago

The calculation of MssGPU and MssCPU may have some mistakes. The following is my calculation procedure:

image

while in the paper: image

Can you tell me do they build the system using the wrong result or just a slip of the pen ?

ymjiang commented 1 year ago

@shadow150519 Yes, this is a typo in the paper. Despite the typo, the final conclusion in equation (6) is right and BytePS correctly implements the traffic allocation strategy. Thank you for pointing out this.

shadow150519 commented 1 year ago

@ymjiang base on tc = tg and eq3, eq4, eq5, i get the following result

$$ M{ss{GPU}} = \frac{k - 1}{-kn + 2k -n}*M $$

$$M{ss{CPU}} = \frac{2(1-n)}{-kn+2k-n}*M$$

and the topt should be the following result

$$ t_{opt} = \frac{2(1-n)}{(-kn+2k-n)B}M $$

eq7 is wrong and should be the following

$$ \gamma{\alpha} = \frac{\frac{2(n-1)M}{nB}}{t{opt}} = \frac{kn-2k + n}{n} $$

$$ \gamma_{p} = \frac{\frac{nM}{kB}}{t_opt} = \frac{n(-kn+2k-n)}{2k(1-n)}$$

i think these should be the correct result

shadow150519 commented 1 year ago

i think you do the calculation totally wrong in section 4.1