I wanted to train this network on the spice dataset (similar task where I want to predict forces and energy from structure). I was comparing speed of training with torchmd-net (https://github.com/torchmd/torchmd-net). For same parameter count, torchmd-net is at least thrice faster as compared to Equiformer and twice as compared to Equiformer-v2. Is this something common or a bug on my end?
I do not think parameter count is a good proxy or a good indicator. We can have the same number of parameters but quite different speeds. For example, we can use the same model but with different cutoff radii.
For a better comparison, maybe you can fix the number of layers and use the same cutoff radius. In this case, Equiformer should be slower as well. This is because we (1) use degrees > 1 and (2) use non-linear message in addition to attention. Please see the paper of Equiformer for details and discussion.
As long as you can run training Equiformer, you would be able to compare your results (errors and training time) with my training logs. If the results are similar, then it should be correct.
Let me know if you have any other specific question.
I wanted to train this network on the spice dataset (similar task where I want to predict forces and energy from structure). I was comparing speed of training with
torchmd-net
(https://github.com/torchmd/torchmd-net). For same parameter count, torchmd-net is at least thrice faster as compared to Equiformer and twice as compared to Equiformer-v2. Is this something common or a bug on my end?