Closed cortner closed 9 months ago
My very first test with just L = 3
:
# symbolic R-Ylm and using only static arrays
3ns
# in-place preallocated ... `evaluate!`
107.422 ns
I know this sounds completely ridiculous (constant propagation??), but I tested the result by evaluating it for randomly generated 1000 inputs, and I get almost exactly 3µs
. I'm still suspicious but I'm not sure what to test next.
No storage or anything required, the basis functions could simply be evaluated on the fly (which I haven't really tested yet)
That is very impressive!
How many edges do you have? Do you expect it would get even better as the number of edges increases?
this is just a raw micro-benchmark of the Ylm basis. Hard to say how it would affect the performance of the entire architecture
@CheukHinHoJerry this might interest you as well...
The speed up is impressive - though I am not certain about this, my guess is that this might not changes a lot since the bottleneck usually comes from the correlation of features from my previous experience?
Would be great if @ilyes319 could let me know about this - do we still have the same bottleneck under some standard hyperparameters in MACE/general graph architectures? Or would the most costly part becomes the embedding after applying tensor contractions and keep the dimension low?
Hi Jerry - the bottleneck is different for different models. For some small ACE models that use low correlation order but comparatively large L the Ylms do end up being the bottleneck.
There is also a second aspect here. If we can manage to make this evaluation this fast but on-the-fly without storage (non-trivial!) then might reduce the memory footprint of a GNN in helpful ways.
I suggest to not rush into this now. It is not urgent for us, but Ilyes and I briefly discussed it, so I got curious and wanted to try it out.
this is superceded by #74
Is there a potential advantage in evaluating spherical harmonics on-the-fly from analytic expressions instead of recursively? It means zero-memory usage. Most likely for large
lmax
no, but we never go that high. And for ENN architectures, one sticks with extremely low degrees. In general, there could be a huge advantage for multi-threading.An optimized basis code could be code-generated using StaticPolynomials.jl
https://en.wikipedia.org/wiki/Table_of_spherical_harmonics
CC @ilyes319