Create a PretrainedEmbeddings block and integrate it with InputBlockV2.

Prototype API

input_block = InputBlockV2(schema, 
                   categorical: Union[Tag,Block] = Tag.CATEGORICAL, 
                   continuous: Union[Tag,Block] = Tag.CONTINUOUS,  
                    pretrained_embeddings: Union[Tag,Block] = Tag.EMBEDDINGS, ...)

def PretrainedEmbeddings(schema, 
                                      projection: Optional[Union[int,dict,Block,Layer,Dict[str, Layer]]] = None, 
                                      normalization: Union[str, Layer] = None,
                                      sequence_combiner = "mean") -> ParallelBlock:

Include some common normalization options: e.g. L2 normalization, standardization?

If PretrainedEmbeddings.projection is:

int or dict[int], we create MLPBlock for each branch
Dict[str, Layer] or ParallelBlock, the keys should match the feature names and we just connect the pre-trained embeddings with the projection layers
Layer (not ParallelBlock) - we need to use an updated version of the MapValues layer, that take a shared arg and in the build() we clone it for each branch

Starting point

The Embeddings function in MM TF API
The PyT, where Embeddings is a class extending ParallelBlock

Testing

Create tests on how to combine/aggregate pre-trained embeddings with other features and also how to aggregate sequential features:
Agregating 3D a sequence feature into 2D: SequenceAggregator: "max", "min", "sum", "mean"
Aggregating multiple features: ConcatFeatures, ElementwiseSum, ElementWiseMultiply

NVIDIA-Merlin / models

Create a PretrainedEmbeddings block and integrate it with InputBlockV2. #1068

Starting point

Testing