Open LourensVeen opened 10 months ago
For completeness, I forgot to mention Kirin, which is similar to Sapporo light and possibly a predecessor of it?
Hi @LourensVeen,
Okay, let's do sapporo2 then.
I've managed to compile it with CUDA 12 after some tweaks, but not with the OpenCL support in CUDA. OpenCL is rather deprecated at this point, but I'll try with a non-CUDA OpenCL library to see if that helps. It could be that it simply uses obsolete OpenCL features, the CUDA compiler also gives a bunch of deprecation warnings about the CUDA kernels.
I have no experience with GPU programming, although I've always wanted to learn. It would probably be more efficient though to see if I can get one of my colleagues to modernise this and maybe convert it to HIP and Vulkan, or whatever seems appropriate. But that would be a separate project, so I'd like to postpone that and focus on packaging things as they are for now.
Question for @jbedorf: if I make the changes above, would you be able to review and merge a couple pull requests?
Question for @jbedorf: if I make the changes above, would you be able to review and merge a couple pull requests?
Sure! And if you have any questions on the code let me know and I'll see what I remember.
Okay, I have the above basically done, but now the plot thickens. Of course the point of this is to build a Conda package, and I've tried to do that, and with my changes Sapporo2 compiles successfully in a Conda environment with Conda-installed compilers and CUDA libraries.
However, there is a packaging issue with the conda-forge CUDA packages (https://github.com/conda-forge/nvcc-feedstock/issues/12) which makes it impossible to build Conda packages against the old libcuda.so
interface. Newer CUDA programs link against libcudart.so
apparently, which implements a newer API, and that is supposed to work. But that didn't exist yet when Sapporo2 was written. (I think they split the driver from the library, libcuda.so
comes with the driver while libcudart.so
comes with the library, or something like that.)
So it looks like I'll need to either implement a compatibility layer for the compatibility layer, or learn enough CUDA to port Sapporo2 to the new API. The former doesn't make sense, so I guess the latter it is. Also, I need to check the community codes that use CUDA and see if we can expect that problem to appear in other places too... Time to do some reading and make a plan.
Back in the days (and maybe still today?) the cuda-runtime wrapper was meant as an easier to use interface than the cuda-driver interface. However, the driver interface allowed us to make the host code universal for both CUDA & OpenCL by developing a thin CUDA/OpenCL specific layer as those APIs would follow similar methods & semantics. Whereas the runtime library at that point required the <<< >>>
launch configuration settings.
Switching to the runtime library is totally possible but it would require the host code section of the sapporo library to use the runtime library, and as such dropping OpenCL support. Given your previous comments about OpenCL that might not be a bad thing per-se, but it would be much more work than just changing the make files...
That sounds like quite a bit of work. Also, I've been hearing some noise regarding OpenCL making a bit of a comeback in the last year or so, so it's hard to see what will happen. Maybe something like Kokkos is the way to go.
Anyway, I've done some more digging around, and it seems like there may be a way to just tell Conda that it's okay to have this dangling dependency that needs to be resolved from the system. There are conda-forge packages for Gromacs and Pytorch that use CUDA, so it seems like there should be a way. Although it also seems that conda-forge has its own way of dealing with CUDA that doesn't work one-on-one elsewhere, so I need to play with this more.
It looks like there's also a way to provide multiple packages with different backends, i.e. a sapporo2-cuda and a sapporo2-opencl, and then the user can specify which they want to use, after which conda install amuse
should automatically grab the appropriate one. I don't know how that works yet either, but I'm going to figure it out :smile:.
Bit of an update here. I got the dangling dynamic link taken care of, there turns out to be an option for that, and other packages use it too, and it makes sense. I can at least locally build a Sapporo2 Conda package now that depends on CUDA, although there's no testing yet. At any rate I don't want to publish anything until we have some client code built against it and packaged and tested.
I've also been looking into the multiple-backends issue, and this is a mess. I haven't found an example of a package that has multiple implementations with the same API/ABI, which we would have here. Debian's dpkg
has virtual packages, which is exactly what we would need, but it doesn't seem like Conda has them. (It does have something called virtual packages, but it's not the same thing.)
MPI is a bit similar to what we are doing, in that it has a standard API at least. On conda-forge, there's an mpi
metapackage which has multiple copies with different build strings for the different MPI implementations, so you get mpi-1.0-openmpi
and mpi-1.0-mpich
etc. Packages that need MPI, like mpi4py
then build for all different versions of MPI, with each package depending on the corresponding dependency directly, which in turn depends on the corresponding version of mpi
.
So now, if the user pins mpi
to mpi-1.0-openmpi
, then the only MPI implementation that will install is openmpi
, because installing e.g. mpich
would upgrade (sidegrade?) mpi
to mpi-1.0-mpich
and that's impossible because of the pin. So when installing mpi4py
, you'd automatically get a version of it that uses openmpi
, because that's the only combination that's compatible with the pinned version of mpi
. If you try to conda install mpi
you get the Intel MPI version of that package, but note that it's empty and that Intel MPI isn't actually installed. If you try to conda install mpi4py
you get the version with mpich
, but I can't find any specification of this being the preferred option, it seems to be random.
So we could use this mechanism to keep a sapporo2-opencl
and sapporo2-cuda
package from being installed at the same time, with client code depending on a sapporo2
metapackage along the lines of mpi
, and the user then installing the client code and either sapporo2-opencl
or sapporo2-cuda
explicitly. We could possible have amuse-opencl
and amuse-cuda
metapackages that would depend on the corresponding version of sapporo2
, so that the user can just install one package and get the whole compatible stack.
It seems that the more standard way to do things is to make different package variants, which means we'd have a single package sapporo2
with two variants cuda
and opencl
(or likely more, for different CUDA versions). Packages using Sapporo would then build multiple variants as well. You can only have one variant of package installed, so collisions would be avoided automatically by Conda.
The issue with that is that if you have multiple dependencies like that, you get a combinatorial explosion. Gromacs for example has MPI/No MPI, CUDA/No CUDA, and double precision or not, but it skips certain combinations so that in the end we get packages for five different combinations. The build number is abused here to specify a preference for No MPI, No CUDA and single precision.
Doing something similar would potentially lead to a lot of different packages being built, but if this is how it works then perhaps its best to just go with the flow. A user may eventually end up installing AMUSE using
conda install 'amuse=*=cuda'
if they have an nVidia GPU, with
conda install amuse
installing the latest CPU-only version, and
conda install 'amuse=*=opencl'
installing as much as can be installed with OpenCL available.
And then I tried to run my meta.yaml
with the conda-forge infrastructure and discovered that they do CUDA differently. There an issue at https://github.com/conda-forge/cuda-version-feedstock/issues/1 where they hashed out the design, but it doesn't seem to have made it to the maintainer docs yet, so you have to find it.
But well, that design does actually make sense and once you've figured out how to do it, it does seem to work. Although I still need to add tests, and I'm not sure how to build versions for different CUDA versions, and/or whether that is needed. 11.2 seems to be it for now.
Question for resident Mac expert @rieder: as I understand it Macs with nVidia chips and CUDA are getting rare, CUDA on Mac is no longer supported by either Apple or nVidia, and neither is OpenCL. Is that right? Does it make sense then to only build Linux packages of Sapporo2? Or should I try to see if the CPU support that the code seems to hint at really is there and can be revived? Or maybe the answer to that is to use the OpenCL version with pocl? That is supposed to work on Mac actually, as far as I understand, but I'm not sure if there's a point to doing so?
Looks like the answer to GPU-on-mac is that somebody should add Metal support to Sapporo2 at some point. Not the highest priority, so we'll leave that for the future, and build Sapporo2 only for Linux OpenCL and CUDA.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.
I'm looking at creating Anaconda packages (#525), and it seems that the place to start is with the Sapporo library. There is currently a
sapporo_light
in thelib/
directory here, and there is Sapporo2 in a separate repository.(Of the other things in
lib/
, I understandforsockets
,stopcond
andamuse_mpi
to be a part of the AMUSE framework, and probably best packaged with that, whileg6
(if still relevant) andsimple_hash
could be separate packages as well. But those are separate issues.)What's what
As I understand it, Sapporo started life as a compatibility library that allowed codes written for the GRAPE5/GRAPE6 hardware to run on any CUDA-compatible GPU. There's now a Sapporo light which seems like a simplified version that only supports the GRAPE6 API, and a Sapporo2 which supports GRAPE5 as well as two new integrators that I'm not sure correspond to any hardware, and which have a different API (maybe the idea was to share some GPU code?). Then there is
g6lib
, which implements the GRAPE6 API on the CPU. There's a copy ofg6lib
inside thesapporo_light
directory, probably by mistake as it doesn't seem to get compiled or used anywhere.Here's an overview of what's what:
Fortran likes to add an underscore to the end of symbols in its ABI, where C does not, so if you've got a C function that's supposed to be called from Fortran, you'll want to add an underscore to its name. For each API above,
C
means that there is a non-underscore version, andF
that there is a version with an underscore of the symbols.Users of Sapporo
There seem to be four community codes in AMUSE that use Sapporo:
6thorder.h
for the functions in Sapporo2'ssapporo6thlib.cpp
. Also has a sapporo2_dummy.cc with a CPU-based implementation of the 6th API.Currently, the first three are built against
sapporo_light
, whilemi6
falls back to the CPU and probably needs the user to supply asapporo2
installation for it to use the GPU.bhtree
seems to be able to work withg6lib
, but it's disabled in the Makefile.So, it seems that
bhtree
,ph4
andphigrape
all use the GRAPE6 API, and can work with eithersapporo_light
orsapporo2
, whilemi6
requiressapporo2
for GPU support, but can use its ownsapporo2_dummy.cc
on the CPU if there is no GPU or nosapporo2
.Plan
Looking at all this, it seems to me that it makes sense to package
sapporo2
, and then build everything else against it, and not bothering withsapporo_light
. Is that right, or am I missing something?To do that, a few improvements would be good to have:
sapporo2
, where they belong, so that we can remove them from the community codes. (I'd leave a Fortran module definition for the future.)include/
and the rest intosrc/
,lib/
to me is a directory where you install binaries.This should all be backwards compatible, but I should probably test building AMUSE against the new version to be sure.
Questions
sapporo2
and forget aboutsapporo_light
?sapporo2
repository?sapporo2
whose work I might mess up by changing anything?