[micro_perf] add ops, add int8, fix dist bug.

bytedance / ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

https://bytemlperf.ai/

Apache License 2.0

197 stars 60 forks source link

[micro_perf] add ops, add int8, fix dist bug. #70

Closed suisiyuan closed 3 months ago

suisiyuan commented 5 months ago

add batch_gemm, group_gemm; add int8 dtype to gemm ops; fix situation that world_size exceeds available devices.

YJessicaGao commented 3 months ago

support more ops cast, silu, swiglu, div, mul, sub, gemv, reducemax, reducemin, reducesum, p2p modify workloads support input shape groups