ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
18 stars 14 forks source link

Revert "pass all TensorListMetadata as pointer to pinned host memory (#13) #47

Closed jeffdaily closed 3 years ago

jeffdaily commented 3 years ago

This reverts commit bdd481d15da054bceecd1ea61fe9c45e148f71b6.

pruthvistony commented 3 years ago

Ran the NV-BERT on 1 GPU and dumped the performance numbers

training_sequences_per_second values Without the PR run 1 - 21.918 run 2 - 21.681

With the PR run 1 - 21.671 run 2 - 21.553

So it seems this PR is not causing any performance regression.