Closed avnish-wynk closed 1 year ago
All sparse features are handled in the same way and embedded into the same dimension by default.
This does not have to be the case, though. This and related questions are discussed in the following references: A. Ginart, et al. Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems M. Naumov, On the Dimensionality of Embeddings for Sparse Features and Data
Will it not be redundant to encode boolean features into higher dimensions? Do we do it just to be able to calculate the pairwise feature interactions?
Can we even calculate pairwise feature interactions (dot products) on Mixed dimension embeddings?
There could be multiple techniques and reasons for encoding boolean features. First, you can combine them and transform them into n-grams. Also, by encoding them you are giving them a certain meaning in the abstract embedding space and are later able to interact them.
Notice that you can always pass the mixed dimension features through an appropriately sized matrix multiplication and then interact the results using a dot product.
Thanks for the information @mnaumovfb. Closing the issue.
How does DLRM handle boolean features or low-cardinality features?
Do we embed them to the same dimensionality as we embed all other sparse features? but won't that be redundant as we'll be embedding features with low cardinality ex. 5 into a much larger space say 16-dim?
I am assuming we need to embed all features to the same dimensionality to calculate the pairwise interactions. How should features with cardinality varying from 2 to 1M be handled?