Aboud the DLRM's performence, a conflict!

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.53k stars 3.23k forks source link

Aboud the DLRM's performence, a conflict! #600

Closed idealboy closed 4 years ago

idealboy commented 4 years ago

I use DLRM in Pytorch on v100 single GPU(sxm-2,32G), but I found that, the actual performance(fp16 mixed precision) is only about 800000 , it it more lower than the performance result(displayed on the page).

How can I achieve the same result? what is the throughout in items/images per second? what is the performence's unit?

thank you very much!

tgrel commented 4 years ago

Hi @idealboy the throughput is measured in items processed per second. One thing that could negatively affect your performance is I/O speed. Do you use a fast SSD storage? If not then you might be I/O limited.

What about FP32 performance? Can you match what is stated in the README?

Also, can you confirm you're using our Docker image for the training? This will ensure that you're using the software with the same versions.

Sorry for the late response.

tgrel commented 4 years ago

HI @idealboy, any news about this issue?

tgrel commented 4 years ago

Closing due to lack of activity. Please feel free to reopen if you stil think something's wrong.