Add support for padding, transforming, and masking sequential inputs data in MM Pytorch backend
The implemented transform classes should:
Support multiple targets
Be used for training, evaluation, and inference
Implementation Details :construction:
[x] Implement TabularBatchPadding to pad a group of sequential inputs
[x] Implement TabularPredictNext for generating targets of causal next item prediction
[ ] Implement TabularPredictLast for generating targets of last item prediction
[ ] Implement TabularPredictRandom for generating targets of predicting one random item and truncate the sequence so that the random item is at the last position.
[x] Implement TabularMaskRandom for masked language modeling training (MLM) strategy
[x] Implement TabularMaskLast for masking last item in the sequence, generally used to evaluate models trained with MLM.
Testing Details :mag:
Defined tests for padding and the different sequence transformations
Fixes # (issue)
Goals :soccer:
Implementation Details :construction:
TabularBatchPadding
to pad a group of sequential inputsTabularPredictNext
for generating targets of causal next item predictionTabularPredictLast
for generating targets of last item predictionTabularPredictRandom
for generating targets of predicting one random item and truncate the sequence so that the random item is at the last position.TabularMaskRandom
for masked language modeling training (MLM) strategyTabularMaskLast
for masking last item in the sequence, generally used to evaluate models trained with MLM.Testing Details :mag: