[Question] how are the categorical features in test data converted to the embedding indices?

Hi,

I'm trying to understand the DLRM process with Criteo Kaggle dataset. I've understood the training process and found that each categorical features in training data is converted to an unique index. (For example, "0x68fd1e64" is converted to the index "0x0 (lS_i)")

Here is my question. During the inference process, how are the categorical features converted to the indices which are corresponding to the embedding table? Since the embedding table and the corresponding indices are determined at the training stage, I think that there would be no information to convert feature value to index at the inference stage. However, the index (IS_i) and offset (IS_o) are already determined and input to "apply_emb" function at the inference stage. I don't understand how the indices of categorical features in test data are already determined before embedding look-up at the inference stage.

For example, there's an embedding table for movie list and "spider man" is determined to have an index "3" at the training stage. When a new user's movie list (categorical features) comes and "spider man" is included in the list, how does the inference model know that the index of the "spider man" is "3" before the embedding look-up stage (apply_emb)? Thank you.

Best regards, SJ Kim

facebookresearch / dlrm

[Question] how are the categorical features in test data converted to the embedding indices? #341