Consider checking for dots in excluded atoms on a per-coarse basis and storing a different dot subset per each ahead of optimization. This should reduce by >2X the number of dots that are checked for each atom location, which is a primary bottleneck.
Add Python or C++ method to determine appropriate subset
Fill in required data structures, per-coarse-angle
Use required data structures, per-coarse-angle
Remove check from C++ inner loop, since we know all dots are valid
Consider checking for dots in excluded atoms on a per-coarse basis and storing a different dot subset per each ahead of optimization. This should reduce by >2X the number of dots that are checked for each atom location, which is a primary bottleneck.