ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
493 stars 181 forks source link

Update compile test case to use larger test system #310

Closed hatemhelal closed 6 months ago

hatemhelal commented 8 months ago

This PR follows up the earlier torch.compile support #300 and aims to make the input test a bit more realistic with 64 carbon atoms. Added additional test cases that use the pytest-benchmark plugin to collect timings for different options.

One subtlety/controversial change is that the correctness test (test_mace) now uses torch.testing.assert_allclose as this uses more permissive comparison tolerances than using assert torch.allclose directly.

Measuring the inference time on an A10G:

Time (ms) Speedup vs eager fp64
Eager fp64 65.1 1
Eager fp32 23.4 2.8
compile default fp32 11.17 5.8
reduce-overhead fp32 9.75 6.7
compile default mixed precision 8.84 7.4
max-autotune fp32 6.75 9.6
reduce-overhead mixed precision 4.81 13.5
max-autotune mixed precision 4.25 15.3
hatemhelal commented 8 months ago

As a quick experiment I tried torch.autocast to use mixed-precision fp16/fp32 and measured an inference time of 4.28 ms which corresponds to a 15x speedup over eager mode with fp64.