The general problem of capturing error messages in a cbind has been solved by switching from bpiterate to bplapply2, so now errors inside the matrix multiplication lead to errors in the parent process. This has the additional bonus of not requiring BiocParallel to be installed to do matrix multiplication.
I also greatly streamlined the decision-making process of choosing the iteration scheme in the matrix multiplication. Now it just prioritizes the larger matrix as the one to split up for distributed computation across workers. This is not really any worse than the previous approach, which relied on the presence of a non-NULL return from chunkGrid as a heuristic to identify which matrix was more likely to be file-backed... and then added an arbitrary penalty of 10 to the access cost of the chunks of that matrix. The new approach is very simple and easy to reason about.
The streamlining was also assisted by the realization that getAutoMultParallelAgnostic() was never actually exported. No users should have been able to use the non-agnostic matrix multiplication scheme, so I just got rid of it in the general case to make maintenance easier. I retained the non-agnostic option for self-cross-products due to its clear performance advantage though the default is still to use the agnostic mode for consistency with %*%.
(I don't know what the future plans for the matrix multiplication algorithm will be, but being able to achieve consistent results that are agnostic to the number of workers, block size and chunk dimensions is very logistically useful. I know that there will be situations where an agnostic algorithm just won't cut it, e.g., due to chunk or block size constraints, and we may need a different algorithm in such cases; but having agnosticism as the default, where possible, really makes it easy to scale things up and down while still getting the same results.)
The general problem of capturing error messages in a
cbind
has been solved by switching frombpiterate
tobplapply2
, so now errors inside the matrix multiplication lead to errors in the parent process. This has the additional bonus of not requiring BiocParallel to be installed to do matrix multiplication.I also greatly streamlined the decision-making process of choosing the iteration scheme in the matrix multiplication. Now it just prioritizes the larger matrix as the one to split up for distributed computation across workers. This is not really any worse than the previous approach, which relied on the presence of a non-
NULL
return fromchunkGrid
as a heuristic to identify which matrix was more likely to be file-backed... and then added an arbitrary penalty of 10 to the access cost of the chunks of that matrix. The new approach is very simple and easy to reason about.The streamlining was also assisted by the realization that
getAutoMultParallelAgnostic()
was never actually exported. No users should have been able to use the non-agnostic matrix multiplication scheme, so I just got rid of it in the general case to make maintenance easier. I retained the non-agnostic option for self-cross-products due to its clear performance advantage though the default is still to use the agnostic mode for consistency with%*%
.(I don't know what the future plans for the matrix multiplication algorithm will be, but being able to achieve consistent results that are agnostic to the number of workers, block size and chunk dimensions is very logistically useful. I know that there will be situations where an agnostic algorithm just won't cut it, e.g., due to chunk or block size constraints, and we may need a different algorithm in such cases; but having agnosticism as the default, where possible, really makes it easy to scale things up and down while still getting the same results.)
Finally, I enhanced the battery of unit tests.