The input to instruction @15 quantizelinear is a broadcasted literal. The broadcast instruction should have been swapped by the find_inner_broadcasts matcher in the simplify_algebra compiler pass to then allow the propagate_constant pass to make it into a constant.
--fp8
flag and probably also--int8
bin/driver perf /codes/distilgpt2_1_fp16_gpu.onnx --fp8 --fill1 input_ids --input-dim @input_ids 64 384 --batch 64
@15 quantizelinear
is a broadcasted literal. The broadcast instruction should have been swapped by thefind_inner_broadcasts
matcher in thesimplify_algebra
compiler pass to then allow thepropagate_constant
pass to make it into a constant.