ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
17 stars 14 forks source link

Luise/gbn optimization #105

Closed luise1030 closed 1 year ago

luise1030 commented 1 year ago

The change of default used C_ELEMENTS_PER_CTA from 64 to 128 for optimization purposes. The following extra tests have been tested for accuracy evaluation.

image