For the Jacobian computation, we carry out compensated summation when adding in the new Jacobian term. This routine (comp_sum_matrix) is very slow.
Maybe we can use BLAS or other optimized linear algebra routines to carry this out? We might have to allocate a temporary matrix of the same size to do this.
[x] Edit the routine to take the three matrices as well as a tmp matrix.
[x] Carry out the computation with broadcast (will lower to BLAS?).
For the Jacobian computation, we carry out compensated summation when adding in the new Jacobian term. This routine (comp_sum_matrix) is very slow.
Maybe we can use BLAS or other optimized linear algebra routines to carry this out? We might have to allocate a temporary matrix of the same size to do this.