facebookresearch / FAMBench

Benchmarks to capture important workloads.
Apache License 2.0
28 stars 23 forks source link

Port Initial RNNT OOTB Training to FB5 #28

Closed aaronenyeshi closed 2 years ago

aaronenyeshi commented 2 years ago

Add support for FB5Logger to RNN-T Training, and reduce outputs produced by mlperf. Add the run_rnnt_ootb_train.sh script and support launching for training against the LibriSpeech dataset. Since LibriSpeech is very large, a training session will take too long, so only train for 120 seconds, and save the number of samples trained. Also fix linting issues.

Here are the results on 1 device, A100:

(py383) aaronshi@ip-10-200-69-98:~/cluster/work/proxyworkloads/benchmarks$ cat results/rnnt_ootb_train_tiny.log 
{"benchmark": "RNN-T", "implementation": "OOTB", "mode": "train", "config": "tiny", "score_metric": "exps", "key": "header"}
{"time_ms": 1634580865841, "key": "run_start"}
{"time_ms": 1634580995954, "num_batches": 704, "batch_size": 1024, "key": "run_stop"}

$ python ../fb5logging/result_summarizer.py -f results/
Summarizing files: ['results/rnnt_ootb_train_tiny.log']

$ cat results/summary.txt 
benchmark implementation mode config score units
RNN-T OOTB train tiny 5540.53784018507 ex/s