The goal is to track the training + inference latency of common NLP backbones so that we can choose the appropriate ones for our task. This will help users train + deploy models with AWS.
Currently, we covered:
Huggingface/Transformer-based backbone with FP32 + FP16 training / inference. For FP16 training, we are not profiling against the AMP-based solution so this gives an edge of pytorch, in which we need to fix
MXNet 2.0-nightly version (only for community use) + GluonNLP 1.0 with FP32 + FP16 (amp) training / inference.
TVM FP32 inference. Due to some recent upgrade of the code base, this is currently broken.
I will share the following action items that I feel are worthwhile doing:
Short-term Bug-fix + Improvement
[ ] Fix the FP16 training benchmark in Huggingface/Transformer to use AMP in PyTorch
Description
In GluonNLP, we introduced the benchmarking script in https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks.
The goal is to track the training + inference latency of common NLP backbones so that we can choose the appropriate ones for our task. This will help users train + deploy models with AWS.
Currently, we covered:
I will share the following action items that I feel are worthwhile doing:
Short-term Bug-fix + Improvement
Automation + Visualization
Longer-term Backbone Benchmarking Effort
Other longer-term efforts
@dmlc/gluon-nlp-committers