NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations
https://nvidia-merlin.github.io/models/main/index.html
Apache License 2.0
262 stars 50 forks source link

Investigate if embeddings lookup with `tf.RaggedTensor` is slower than with `tf.Tensor` with latter versions of TF #1038

Open gabrielspmoreira opened 1 year ago

gabrielspmoreira commented 1 year ago

In previous experiments from @vysarge (June 2022) it was found that tf.RaggedTensor representation is slower than using fixed-length dense tf.Tensor for embedding lookup, as shown in this spreadsheet.

This tasks is about benchmarking the difference of embeddings lookup for dense x ragged multi-hot columns, as MM does extensive usage of tf.RaggedTensor for multi-hot and for sequential / session-based recommendation.

Notes

gabrielspmoreira commented 1 year ago

Comment by @vysarge

a recent test on my side suggests using tf.keras.layers.Embedding + reduce_mean with RaggedTensor input remains slower than with dense Tensor input as of TF 2.12.0 (tensorflow/tensorflow:2.12.0-gpu container). RaggedTensors in these tests were constructed using RaggedTensor.from_tensor with the padding argument set. This means ultimately these tensors were constructed with from_row_splits and that value_rowids and row_lengths indexing schemes are not cached for these tensors (so some ops can be slower than for a RaggedTensor constructed with from_value_rowids, for example). A confounding factor is that many of these tests are currently slower overall due to the switch from legacy to experimental optimizer code. For example, one SGD test case measured here as 10ms/step with dense Tensor and 47ms/step with RaggedTensor now appears to take 35ms/step with dense Tensor and 80ms/step with RaggedTensor. See also: nvbugs/3908345 I did a very small test only, so it may not be the same for embeddings with other dimensions etc.