PaddlePaddle / benchmark

76 stars 153 forks source link

Which is used for BERT training benchmark #84

Open LeoZhao-Intel opened 5 years ago

LeoZhao-Intel commented 5 years ago

Which script is used for BERT training benchmark, I see there are 2 kind of script, one is for pre-train, e.g train.py, the other is for fine tuning, e.g. run_classify.py. Which one is used for benchmark?

LeoZhao-Intel commented 5 years ago

@luotao1

luotao1 commented 5 years ago

We use run_classify.py.

LeoZhao-Intel commented 5 years ago

@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed?

LeoZhao-Intel commented 5 years ago

@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed?

@luotao1 Any feedback on this question?

luotao1 commented 5 years ago

We don't use speed * CPU_NUM, which is for throughput.

LeoZhao-Intel commented 5 years ago

then how to measure if speed is comparable with V100 ? e.g. V100: BS=1 speed 3.4steps/s, Xeon: BS=1 8 CPU_NUM: speed 0.43 steps/s

Are they identical?

luotao1 commented 5 years ago

It is not identical. BS=1 CPU_NUM=8: speed 0.43 steps/s, means: BS=1 CPU_NUM=1, speed 0.43/8 steps/s? And the speed may be not linear with CPU_NUM increases. You can give the result: BS=1 CPU_NUM=ALL

LeoZhao-Intel commented 5 years ago

Yes, speed is not linear with CPU_NUM, but I checked code, and find this speed reflects iteration execution time, not really processed samples. It means: for each iteration, the processed samples is actually batchsize * CPU_NUM. I can confirm this.

So my question is for cpu vs. GPU, we may not compare data directly on speed output from log, given CPU_NUM is a virtual concept to use CPU multi-cores , and used to utilize data parallelism, while GPU need discrete card to extend multi-node. This speed is more like latency,

We can give different speed with different CPU_NUM, but how to compare them with GPU fairly, that is what I want to ask.

luotao1 commented 5 years ago

but how to compare them with GPU fairly, that is what I want to ask.

how about compute samples/s to compare between CPU and GPU?

LeoZhao-Intel commented 5 years ago

I see this calculation logic in benchmark run.sh by use samples/s, it counts both CPU_NUM, BS. I think it makes more sense.