Oneflow-Inc / OneFlow-Benchmark

OneFlow models for benchmarking.
104 stars 31 forks source link

Optimize bert attention mask calculation #157

Closed ShawnXuan closed 3 years ago

ShawnXuan commented 3 years ago
  1. move calculation of addr out of layer
  2. fix loss print bug in fp32

regression test result: image