This PR follows up the earlier torch.compile support #300 and aims to make the input test a bit more realistic with 64 carbon atoms. Added additional test cases that use the pytest-benchmark plugin to collect timings for different options.
One subtlety/controversial change is that the correctness test (test_mace) now uses torch.testing.assert_allclose as this uses more permissive comparison tolerances than using assert torch.allclose directly.
As a quick experiment I tried torch.autocast to use mixed-precision fp16/fp32 and measured an inference time of 4.28 ms which corresponds to a 15x speedup over eager mode with fp64.
This PR follows up the earlier torch.compile support #300 and aims to make the input test a bit more realistic with 64 carbon atoms. Added additional test cases that use the pytest-benchmark plugin to collect timings for different options.
One subtlety/controversial change is that the correctness test (
test_mace
) now usestorch.testing.assert_allclose
as this uses more permissive comparison tolerances than usingassert torch.allclose
directly.Measuring the inference time on an A10G: