Enable using kwargs for selecting pad-to-max-length strategy for tokenizer in embeddings

@gkumbhat I also wrote a unit test so let me push that as well.

I suspect the def sum_token_count in embeddings.py is doing what is intended. By counting sum(encoding.attention_mask) the 0s will be dropped.

In my test case, I originally wanted to show that the results stays the same but there will be a change to the tokenizer input_ids and attention_mask shape. I will not use input_token_count which comes from sum_token_count as they will come out to be the same despite the change in attention_mask. For now I only test that the change in tokenizer option in our use case does not change the final result.

caikit / caikit-nlp

Enable using kwargs for selecting pad-to-max-length strategy for tokenizer in embeddings #393