dgasmith / gau2grid

Fast computation of a gaussian and its derivative on a grid.
https://gau2grid.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
29 stars 15 forks source link

Calculating cubes for visualization using the C interface #12

Open cryos opened 6 years ago

cryos commented 6 years ago

I would like to integrate this into Avogadro, and make use of it in a desktop application. In addition, this is wrapped, and used on a server to deliver to web clients using a RESTful API. We effectively load up Gaussian basis sets, and an MO coefficient matrix.

I currently map each basis function to an atom, with a position in space, and then parallelise the code on the grid, effectively giving a smaller grid to each core, writing directly into it from all threads but ensuring only one thread will ever attempt to write to any given scalar. The code figures out which index to write to, its Cartesian position in space, and then iterates over all basis functions for the supplied molecular orbital.

I am open to reorganizing the code, but would like to maintain the high level interface. Load quantum output, then calculate the MO cube, which is then further processed before being visualized. It would be helpful to understand how I would map our basis sets, MO matrix, and atom positions in. Also what the most efficient method of passing in a subset of the cube, and whether we would need multiple copies of the data per thread, or could maintain the single copy that is only read across all threads.

dgasmith commented 6 years ago

That is effectively the operation that something like Psi4 uses so we have very similar use cases. Let see the operation you want to complete is C_ui G_up -> E_ip where i is the MO orbitals you are after u the AO basis and p the points on the grid. What gau2grid does is give you the array G_up through the C interface:

void gg_collocation(int L, const unsigned long npoints, const double* __restrict__ x, const double* __restrict__ y,
                    const double* __restrict__ z, const int nprim, const double* __restrict__ coeffs,
                    const double* __restrict__ exponents, const double* __restrict__ center, const int spherical,
                    double* __restrict__ phi_out);

With a gaussian in the form of G_up = x^l y^m z^n \sum_i coeff_i e^(exponent_i * (|center - p|)^2).

In Psi4 to build out a full G we use the following:

    double* tmpp = basis_temps_["PHI"]->pointer()[0];
    int nvals = 0;
    for (size_t Qlocal = 0; Qlocal < shells.size(); Qlocal++) {
        int Qglobal = shells[Qlocal];
        const GaussianShell& Qshell = primary_->shell(Qglobal);
        Vector3 v = Qshell.center();
        int L = Qshell.am();
        int nQ = Qshell.nfunction();
        int nprim = Qshell.nprimitive();
        const double* alpha = Qshell.exps();
        const double* norm = Qshell.coefs();

        // Make new pointers, gg computes along rows so we need to skip down `nval` rows.
        size_t row_shift = nvals * npoints;
        double* phi_start = tmpp + row_shift;

        // Copmute collocation
        gg_collocation(L, npoints, x, y, z, nprim, norm, alpha, v.data(), (int)puream_, phi_start);
        if (puream_) {
            nvals += 2 * L + 1;
        } else {
            nvals += nQ;  // Cartesian is already computed
        }

A few questions: 1) Do you figure out basis function extents and localize computations of MO's? Or do you do non-sparse computations? 2) How many MO's do you typically compute at the same time? 3) Do you build the full collocation matrices for GEMMing or contract the results in-core? 4) How much should the interface take care of? Im not particularly keen on putting GEMM or threading into Gau2Grid as that creates a lot of linking problems.

cryos commented 6 years ago
  1. Do you figure out basis function extents and localize computations of MO's? Or do you do non-sparse computations?

We do not, but I have thought about the best way of building this in to accelerate.

  1. How many MO's do you typically compute at the same time?

Usually just one, but in the future I would like to look at 2-5 (around HOMO/LUMO).

  1. Do you build the full collocation matrices for GEMMing or contract the results in-core?

No, I think I would need to read up on this :-) I think what we do is on the basic side.

  1. How much should the interface take care of? Im not particularly keen on putting GEMM or threading into Gau2Grid as that creates a lot of linking problems.

I entirely agree on keeping the threading in the calling code, and generally look to use a read-only input with all basis sets, MO matrix, etc. Then assign subgrids to threads, and use embarrassingly parallel approaches to keep things simple.

dgasmith commented 6 years ago

If you only doing 2-5 MO's it probably doesn't make sense to use DGEMM. What if we looked at an example use case where this would compute the orbital values C with the xyz grid into array output:

size_t npoints = 500000;
const double *C = (nbf, nmo);
const double *x = (npoints, );
const double *y = (npoints, );
const double *z = (npoints, );
double *output = (nmo, npoints);
bool spherical = true;

size_t block_size = 500;
size_t nblocks = npoints / block_size size_t nmo_shift = 0;

// Loop over chunks of the grid
for (size_t i = 0; i < nblocks; i++) {

  // Loop over
  size_t block_start = block_size * i;
  size_t nmo_shift = 0;

  // Loop over shells
  #pragma omp parallel for public(nmo_shift)
  for (size_t j = 0; j < nshell; j++) {
    shell = shells[j];
    int nQ = shell.nfunction();
    int L = shell.am();
    int nprim = shell.nprimitive();
    const double *center = shell.center();
    const double *alpha = shell.exps();
    const double *norm = shell.coefs();

    // Evaluate the data
    gg_evaluate_orbitals_grid(
        // AM, how large the grid is, xyz information
        L, block_size, x + block_start, y + block_start, z + block_start,
        // basis information
        nprim, norm, alpha, center, spherical,
        // orbital information. Shift down the current basis functions 
        C + (nmo_shift * nmo), nmo,
        // Output data and the stride of that data as we need to write out multiple rows.
        output + block_start, npoints);

    nmo_shift += nQ;
  }
}
dgasmith commented 6 years ago

@cryos let me know when you want to revisit this. Might want to checkout #13 as it will effect you as well.

dgasmith commented 6 years ago

@cryos Added a basis evaluation function documented here.

dgasmith commented 5 years ago

@cryos Released 1.3 which contains these changes, let me know how it works and we can keep moving.