Creates the PretrainedEmbeddings block that takes pre-trained embedding features and optionally projects them to a dim with a linear layer, applies a sequence aggregator and normalizes (e.g. with l2-norm). All of these options are configurable
The InputBlockV2 was changed to accept an optional pretrained_embeddings argument, which by default selects features tagged with the EMBEDDING tag,
PrepareListFeatures was changed to define the shape of the last dim the pre-trained embeddings provided by the Loader, as that dim is None in graph mode.
Testing Details :mag:
Created many tests demonstrating how pre-trained embeddings can be used with models like DLRM, DCN, sequential Transformer models with BroadcastFeatures and with causal and masked language modeling SequenceMasking classes
I took this opportunity also to try and speed up many tests by reducing drastically the cardinality of categorical features in some dataset schemas used for synthetic data generation. Many tests had to be updated to match the new cardinalities.
Fixes #1070, Fixes #1071, Fixes #1068, Fixes #1072, Fixes #1073
Goals :soccer:
EmbeddingOperator
transform from Merlin dataloader: https://github.com/NVIDIA-Merlin/Merlin/issues/211Tasks
Implementation Details :construction:
PretrainedEmbeddings
block that takes pre-trained embedding features and optionally projects them to a dim with a linear layer, applies a sequence aggregator and normalizes (e.g. with l2-norm). All of these options are configurableInputBlockV2
was changed to accept an optionalpretrained_embeddings
argument, which by default selects features tagged with the EMBEDDING tag,PrepareListFeatures
was changed to define the shape of the last dim the pre-trained embeddings provided by the Loader, as that dim is None in graph mode.Testing Details :mag: