NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations
https://nvidia-merlin.github.io/models/main/index.html
Apache License 2.0
262 stars 50 forks source link

Benchmark, improve and document mixed precision (AMP) support in Models #1033

Open gabrielspmoreira opened 1 year ago

gabrielspmoreira commented 1 year ago

Notes from @vysarge preliminary experiments reported in this spreadsheet (Nvidia internal only):

With AMP on (via keras mixed precision), MM iteration is actually slower by 35.6ms The majority of this time comes from one block of ops, including calls to SetToValue, IsFinite, DeviceReduceKernel, and UnsortedSegmentCustomKernel, which are contributing a combined 37.9ms of GPU time Etiology is unclear; these kernels appear to have one call per categorical feature, and to take longer for features with a high cardinality Without these calls I would expect AMP to be saving ~7ms

Then she wrote

AMP issues appear to be related to loss scaling. The ~35ms slowdown from the previous email is present when not scaling losses or when using keras.mixed_precision.LossScaleOptimizer with the default dynamic=True. Using keras.mixed_precision.LossScaleOptimizer with dynamic=False instead, AMP does indeed save ~7ms off the training iteration time. (See also nvbugs/3980579 tracking a similar issue.)

gabrielspmoreira commented 1 year ago

Comment by @vysarge

a fix for part of the AMP slowdown as described in nvbugs/3980579 has been recently accepted into Keras (PR link).

CarloNicolini commented 4 months ago

Is this issue still open? Is there any example on how to use mixed_precision for training or should we simple follow the standard Tensorflow/Keras solution to do that?