Closed JoshRackers closed 3 years ago
Several questions:
I would need to double check, but if you do not have contracted functions (multiple gaussians per shell computation) I believe we can simply pass in a coefficient of 1 and scale the rows after the matrix creation.
What level of performance is required here? There are a few other loop tricks that we can pull without editing the core C routines which could be a bit complex.
Answers!
Passing a coefficient 1 and scaling the rows when they are passed back is exactly the workaround I've been using. The only issue with this is speed. I'd like to get the performance advantage of using collocation_basis
, but can't use that trick in that case.
There isn't a speed advantage of collocation_basis as it calls a simple wrapper function.
Perhaps what you are after is to write another simple wrapper function for your use case?
Oh, got it. Yep, I'm essentially doing the same thing already, so no need to change anything. Thanks for the help!
Glad you find the project useful!
Hey @dgasmith, I want to verify that I'm getting reasonable performance with this. I'm computing collocation
on a modest 100x100x100 grid on my laptop and getting about 0.011 seconds per call. Iterating through ~1200 basis functions, this ends up taking about 12 seconds. This seems slow to me. Is this in line with the kind of performance you see?
This is with python and without derivatives? Seems somewhat reasonable. Looks like you are generating ~10GB of data or ~1GB/s. You could expect an improvement with the C code. It may also be worth looking at the Python bindings, the highly repeated python-based logic could be a killer here for these relatively fast operations, but I'm not particularly sold on that being an issue.
Codes like Psi4 make heavy use of localization and cutoffs which isn't built into this library.
Correct- with python, no derivatives. Not sure if the python bindings are the bottleneck or not. Tough for me to tell just profiling it with %prun. The performance is certainly much better than my old home cooked scipy version, so no worries. I just wanted to double check I wasn't missing something. I imagine implementing localization and cutoffs would be a pretty big project?
Depends on the complexity that you wish to provide. You can use a KDTree implementation to pull a spherical region of points from your overall grid to pass into individual basis evaluation. Some simple rules of thumb can be used like must be within x angstroms of the basis center. The overall idea there is to exploit the spatial locality of gaussians reduce your 1e6 points evaluated per basis to perhaps 1e4-1e5 points per basis to lower total compute time and/or have the total cost scale linearly with grid density rather than cubically.
I haven't benchmark this recently, but the overall cost that you are seeing seems quite reasonable. A few quick wins that I could see:
Hope this helps!
This is very helpful, thanks! I like the idea of using KDTree to get a spherical region. I'll give that a try.
Right now, for L>0 basis functions, the code only allows a single coefficient. I would like to individually specify coefficients for each component of a L>0 function. For instance, I would like to specify a coefficient c_x for p_x, but a different c_y for p_y and so on. I hope this makes sense.
I have a workaround right now, but it does not work with the
collocation_basis
function, which would be much cleaner and faster. I'm hoping that adding this shouldn't be too difficult.