GoogleCloudPlatform / llm-pipeline-examples

Apache License 2.0
107 stars 26 forks source link

Adding experimental 'Use-Faster-Transformer' option for T5-based models #11

Closed Chris113113 closed 1 year ago

Chris113113 commented 1 year ago

Adds a flag for using FasterTransformer.

Under the hood this will take the fine-tuned model, convert it to FasterTransformer, and then deploy it using Triton as the Inferencing Server instead of Deepspeed.

FasterTransformer benchmarks at ~9x faster than DeepSpeed and the HuggingFace Transformers library.

Limitations: