coefficients for individual spherical harmonic components

dgasmith / gau2grid

Fast computation of a gaussian and its derivative on a grid.

https://gau2grid.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

29 stars 15 forks source link

coefficients for individual spherical harmonic components #70

Closed JoshRackers closed 3 years ago

JoshRackers commented 3 years ago

Right now, for L>0 basis functions, the code only allows a single coefficient. I would like to individually specify coefficients for each component of a L>0 function. For instance, I would like to specify a coefficient c_x for p_x, but a different c_y for p_y and so on. I hope this makes sense.

I have a workaround right now, but it does not work with the collocation_basis function, which would be much cleaner and faster. I'm hoping that adding this shouldn't be too difficult.

dgasmith commented 3 years ago

Several questions:

Does this need to be in the Python or C layer
Do you need derivatives?
Do you need contracted functions.

I would need to double check, but if you do not have contracted functions (multiple gaussians per shell computation) I believe we can simply pass in a coefficient of 1 and scale the rows after the matrix creation.

What level of performance is required here? There are a few other loop tricks that we can pull without editing the core C routines which could be a bit complex.

JoshRackers commented 3 years ago

Answers!

The Python layer
Not at the moment
No

Passing a coefficient 1 and scaling the rows when they are passed back is exactly the workaround I've been using. The only issue with this is speed. I'd like to get the performance advantage of using collocation_basis, but can't use that trick in that case.

dgasmith commented 3 years ago

There isn't a speed advantage of collocation_basis as it calls a simple wrapper function.

Perhaps what you are after is to write another simple wrapper function for your use case?

JoshRackers commented 3 years ago

Oh, got it. Yep, I'm essentially doing the same thing already, so no need to change anything. Thanks for the help!

dgasmith commented 3 years ago

Glad you find the project useful!

JoshRackers commented 3 years ago

Hey @dgasmith, I want to verify that I'm getting reasonable performance with this. I'm computing collocation on a modest 100x100x100 grid on my laptop and getting about 0.011 seconds per call. Iterating through ~1200 basis functions, this ends up taking about 12 seconds. This seems slow to me. Is this in line with the kind of performance you see?

dgasmith commented 3 years ago

This is with python and without derivatives? Seems somewhat reasonable. Looks like you are generating ~10GB of data or ~1GB/s. You could expect an improvement with the C code. It may also be worth looking at the Python bindings, the highly repeated python-based logic could be a killer here for these relatively fast operations, but I'm not particularly sold on that being an issue.

Codes like Psi4 make heavy use of localization and cutoffs which isn't built into this library.

JoshRackers commented 3 years ago

Correct- with python, no derivatives. Not sure if the python bindings are the bottleneck or not. Tough for me to tell just profiling it with %prun. The performance is certainly much better than my old home cooked scipy version, so no worries. I just wanted to double check I wasn't missing something. I imagine implementing localization and cutoffs would be a pretty big project?

dgasmith commented 3 years ago

Depends on the complexity that you wish to provide. You can use a KDTree implementation to pull a spherical region of points from your overall grid to pass into individual basis evaluation. Some simple rules of thumb can be used like must be within x angstroms of the basis center. The overall idea there is to exploit the spatial locality of gaussians reduce your 1e6 points evaluated per basis to perhaps 1e4-1e5 points per basis to lower total compute time and/or have the total cost scale linearly with grid density rather than cubically.

I haven't benchmark this recently, but the overall cost that you are seeing seems quite reasonable. A few quick wins that I could see:

The C code could be 5-10x faster
Parallelize the operation over basis functions
If you are only after S functions you could rewrite the inner loops in numexpr.

Hope this helps!

JoshRackers commented 3 years ago

This is very helpful, thanks! I like the idea of using KDTree to get a spherical region. I'll give that a try.