Closed tkoskela closed 1 month ago
We decided to test all the kernels in CI and see how it does
I'm testing this in #292 and I've got three failures. https://github.com/OrderN/CONQUEST-release/actions/runs/7034701802
ompGemm_m
looks like it's running out of memory on the runner with 2 ranks and 2 threads.
ompDoji
is less clear. When running multi-threaded, It's aborting in
We had some discussion over whether we need to maintain all these kernels. I've left the PR as draft because I wouldn't necessarily want to run all these jobs on every commit, although they do run in parallel and don't add much to the wall clock time of the CI. It feels wasteful to test and maintain code that is not going to be used and trimming down the unused kernels would be better.
There's an open issue for the ompGemm_m
problem: #280
I agree that we should leave it as draft for now; I'll try to find time to look at ompDoji
sometime...
I fixed the issue in ompDoji
, there was an extra parallel region left over.
ompGemm_m
also passed in https://github.com/OrderN/CONQUEST-release/actions/runs/7035086832 :shrug: maybe I got a machine with more memory. I'm trying to investigate #280
I've come up with a potential solution in #292:
By default the CI workflow only runs the default
multiply kernel and excludes the rest, but you can dispatch the workflow manually and set multiply_kernel_test = true
to run all the multiply kernels in the matrix. Unfortunately this is difficult to test until the change is in the default branch because of how the GitHub GUI works. You used to be able to manually dispatch from branches using the CLI, but the information seems to be gone from the page so it might have been removed.
Completed by #292
While working on the threading implementations of the matrix multiply kernel, I realised it would be very useful to test more of them than just the default. Unless we can converge on a single optimal implementation I think we should test at least one threaded and one non-threaded implementation, perhaps also a gemm and a non-gemm one.
It can be fairly easily implemented by adding the multiply kernel into the test matrix. If we use spack to automate the build it's just a matter of changing the variant in the spack spec. I'm a bit concious of blowing up the number of jobs in the test matrix, let's discuss this in the next meeting.