I believe the code within the do nsf2 = 1, jb%nsup loop of m_kern_exx_cri and m_kern_exx_eri is almost identical. Therefore, we should be able to extract this into a subroutine and therefore get the gain of using BLAS and any other optimisation in one place.
The current issue is that in m_ker_exx_eri there is an option to backup_eris. If true, this contiguously accesses a 1D array via eris(kpart)%store_eris(count). With the new OpenMP thread implementation we no longer access these elements in this order.
I believe the code within the
do nsf2 = 1, jb%nsup
loop ofm_kern_exx_cri
andm_kern_exx_eri
is almost identical. Therefore, we should be able to extract this into a subroutine and therefore get the gain of using BLAS and any other optimisation in one place.The current issue is that in
m_ker_exx_eri
there is an option tobackup_eris
. If true, this contiguously accesses a 1D array viaeris(kpart)%store_eris(count)
. With the new OpenMP thread implementation we no longer access these elements in this order.