Closed jamesp-epcc closed 1 year ago
I see the CI is failing. Apparently gcc doesn't like me mapping this
via OpenMP. Unfortunately this is needed to build a working GPU version with Clang. I had thought it should be harmless to have the offload directives there even for CPU builds, but it might be necessary to wrap them in #ifdefs
so that the compiler only sees them when actually building for GPU.
I've wrapped the GPU directives in #ifdefs
and the checks now pass. The original code already had similar checks in place for the OpenMP parallel directives so it's not too big a departure from how we previously did things. I still need to investigate whether actually building and running the GPU version in CI is feasible.
I have addressed Mike's feedback and rebased against the current main branch.
_abs_length_table
are now taken from the table passed in from the Python layer rather than being hardcoded (this was meant to be a temporary change but I forgot about it, apologies).PhotonArray
instead of const_cast
ing it.updatePixelDistortions
has now been removed. By making a few changes I was able to use the GPU-enabled version for all purposes instead.update
method, as it was originally. addDelta
is not being used in update
, but I have added a comment explaining why.The CI tests are failing as they are unable to install codecov
. However I think this is unrelated to my changes (see here: https://github.com/home-assistant/core/issues/91283 ).
Just remove codecov from the ci script.
We actually haven't been using it for a while, since they now recommend a bash uploader. And they recently removed codecov from pypi, so pip can't find it anymore. Probably to help discourage people from using the obsolete uploader.
This is my GPU acceleration work on the sensor model, rebased against the current main branch. Obviously this is a very extensive change, so I'm open to discussing how it's implemented and making changes to better fit the existing code. There is a single unified code base for both GPU and CPU implementations; when offloading is disabled (which can be done via an environment variable or by building with a non-GPU-aware compiler), the offloaded regions act like OpenMP parallel regions, so the loops still take advantage of multiple CPU cores as before. A version of Clang with offloading support is required to build this for GPU.