Open gabrielspmoreira opened 1 year ago
Comment by @vysarge
a recent test on my side suggests using tf.keras.layers.Embedding + reduce_mean with RaggedTensor input remains slower than with dense Tensor input as of TF 2.12.0 (tensorflow/tensorflow:2.12.0-gpu container). RaggedTensors in these tests were constructed using RaggedTensor.from_tensor with the padding argument set. This means ultimately these tensors were constructed with from_row_splits and that value_rowids and row_lengths indexing schemes are not cached for these tensors (so some ops can be slower than for a RaggedTensor constructed with from_value_rowids, for example). A confounding factor is that many of these tests are currently slower overall due to the switch from legacy to experimental optimizer code. For example, one SGD test case measured here as 10ms/step with dense Tensor and 47ms/step with RaggedTensor now appears to take 35ms/step with dense Tensor and 80ms/step with RaggedTensor. See also: nvbugs/3908345 I did a very small test only, so it may not be the same for embeddings with other dimensions etc.
In previous experiments from @vysarge (June 2022) it was found that
tf.RaggedTensor
representation is slower than using fixed-length densetf.Tensor
for embedding lookup, as shown in this spreadsheet.This tasks is about benchmarking the difference of embeddings lookup for dense x ragged multi-hot columns, as MM does extensive usage of
tf.RaggedTensor
for multi-hot and for sequential / session-based recommendation.Notes