Investigate if embeddings lookup with `tf.RaggedTensor` is slower than with `tf.Tensor` with latter versions of TF

NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations

Apache License 2.0

262 stars 50 forks source link

Comment by @vysarge

a recent test on my side suggests using tf.keras.layers.Embedding + reduce_mean with RaggedTensor input remains slower than with dense Tensor input as of TF 2.12.0 (tensorflow/tensorflow:2.12.0-gpu container). RaggedTensors in these tests were constructed using RaggedTensor.from_tensor with the padding argument set. This means ultimately these tensors were constructed with from_row_splits and that value_rowids and row_lengths indexing schemes are not cached for these tensors (so some ops can be slower than for a RaggedTensor constructed with from_value_rowids, for example). A confounding factor is that many of these tests are currently slower overall due to the switch from legacy to experimental optimizer code. For example, one SGD test case measured here as 10ms/step with dense Tensor and 47ms/step with RaggedTensor now appears to take 35ms/step with dense Tensor and 80ms/step with RaggedTensor. See also: nvbugs/3908345 I did a very small test only, so it may not be the same for embeddings with other dimensions etc.

NVIDIA-Merlin / models

Investigate if embeddings lookup with `tf.RaggedTensor` is slower than with `tf.Tensor` with latter versions of TF #1038

Notes