Closed donglinz closed 2 years ago
Update: this model is not support by nvidia tf1 as well due to lack of Einsum op FP16 implementation:
ensorflow.python.framework.errors_impl.NotFoundError: No registered 'Einsum' OpKernel for 'GPU' devices compatible with node node StatefulPartitionedCall/model/bert_encoder/transformer/layer_0/self_attention/key/einsum/Einsum (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748)
(OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_HALF, equation="abc,cde->abde", _device="/job:localhost/replica:0/task:0/device:GPU:0"
. Registered: device='GPU'; T in [DT_COMPLEX128]
device='GPU'; T in [DT_COMPLEX64]
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_FLOAT]
[[StatefulPartitionedCall/model/bert_encoder/transformer/layer_0/self_attention/key/einsum/Einsum]]
Related issue: https://github.com/NVIDIA/tensorflow/issues/40
update: In TF2 Automatic Mixed Precision Grappler Pass can be enabled with config.graph_options.rewrite_options.auto_mixed_precision = 1.
Closing this issue.
I am using tensorflow2.6 inside NGC docker nvcr.io/nvidia/tensorflow:21.10-tf2-py3. Running inference using pre-trained bert model from tensorflow hub https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2
After setting TF_ENABLE_AUTO_MIXED_PRECISION=1, seems nothing happened except below warning log:
I also checked dumped tensorflow hlo, seems no operations have been transferred into FP16 mode. This feature works well in tensorflow 1.x, does it support in nvidia tensorflow2 as well?