Adding experimental 'Use-Faster-Transformer' option for T5-based models

Adds a flag for using FasterTransformer.

Under the hood this will take the fine-tuned model, convert it to FasterTransformer, and then deploy it using Triton as the Inferencing Server instead of Deepspeed.

FasterTransformer benchmarks at ~9x faster than DeepSpeed and the HuggingFace Transformers library.

Limitations:

Currently only supports T5 based models.
T5-v1_1 architecture models do not produce correct outputs in Inferencing. This includes Flan-T5.

GoogleCloudPlatform / llm-pipeline-examples

Adding experimental 'Use-Faster-Transformer' option for T5-based models #11