I have made some changes that allow the use of two-flavor operators. Please have a look into the abs_solver.h where the general case now has a template argument and the current implementation just calls the multi-flavor variant.
There are generalizations of the BLAS routines that take a template argument for the number of flavors. Then they will simply loop over the flavors, see blas_new_c.h.
In the twisted mass operator I have created a non-degenerate operator, see tm_clov_dslash_def.h. The implementation at tm_clov_dslash_body.h just calls the existing one-flavor operators multiple times.
The solver also has a new template parameter, the linear operator base class. This exposes the number of flavors, the solver will call the generalizations of the BLAS functions. See invcg.h for the implementation. My hope is that the template parameter num_flav will result in optimizations of the loops in the BLAS functions to the point where there is no performance difference at all.
I had tested this code a couple of months ago but not further pursued this. There have been a couple of changes like the addition of the int target_cb everywhere. I need to make sure that everything still works properly.
In this review I would like to ask you to comment on the general design of this feature. I will make sure that the unit tests all work before merging this in. In particular: Do you like the template <int num_flav>? Would you want any changes on the design before this could be merged into devel?
I have made some changes that allow the use of two-flavor operators. Please have a look into the
abs_solver.h
where the general case now has a template argument and the current implementation just calls the multi-flavor variant.There are generalizations of the BLAS routines that take a template argument for the number of flavors. Then they will simply loop over the flavors, see
blas_new_c.h
.In the twisted mass operator I have created a non-degenerate operator, see
tm_clov_dslash_def.h
. The implementation attm_clov_dslash_body.h
just calls the existing one-flavor operators multiple times.The solver also has a new template parameter, the linear operator base class. This exposes the number of flavors, the solver will call the generalizations of the BLAS functions. See
invcg.h
for the implementation. My hope is that the template parameternum_flav
will result in optimizations of the loops in the BLAS functions to the point where there is no performance difference at all.I had tested this code a couple of months ago but not further pursued this. There have been a couple of changes like the addition of the
int target_cb
everywhere. I need to make sure that everything still works properly.In this review I would like to ask you to comment on the general design of this feature. I will make sure that the unit tests all work before merging this in. In particular: Do you like the
template <int num_flav>
? Would you want any changes on the design before this could be merged intodevel
?