UoB-HPC / BabelStream

STREAM, for lots of devices written in many programming models
Other
323 stars 110 forks source link

Julia implementation #106

Closed tom91136 closed 2 years ago

tom91136 commented 3 years ago

This PR adds the Julia implementation of BabelStream with the following implementations:

See README.md for details on build and run instructions.

Performance is surprisingly good across all supported hardware platforms. All benchmarks uses Julia 1.6.1, specific versions of each package are available in Manifest.toml

omp cuda

AMDGPU.jl is currently still in heavy development, although the project reports most core features are working. At the time of this PR, there are still a few issues that makes it unsuitable for production use:

hip

Finally, there isn't a process-based (e.g MPI) implementation of BabelStream so comparison for DistributedStream.jl has been omitted. That said, performance seems to be significantly worst than ThreadedStream.jl due to the added serialisation overhead.

Future work

We should be able to include oneAPi.jl once it is ready for general use.

There's also OpenCL.jl but it simply wraps the OpenCL host API; kernels must still be written in OpenCL C, so this wouldn't be any different from BabelStream's OCLStream.

tom91136 commented 3 years ago

Ready for review again.

tomdeakin commented 3 years ago

Thanks @tom91136. I think I prefer the parameter passing rather than making a structure just to hold the arrays. I think that in a larger code with more arrays, you're just going to have to pass things around rather than keep wrapping things up in bundles to pass to different functions. Ideally we're aiming to write BabelStream in a way that is representative of something much bigger.

giordano commented 3 years ago

Performance is surprisingly good across all supported hardware platforms.

:smiley:

Is there something we can do to move this forward? I had a very quick look, but could do a more thorough review, if that helps

tom91136 commented 3 years ago

@giordano Thanks for the review! This PR will be used for an upcoming submission to PMBS so I got a few more local changes (I've added a functional oneAPI.jl and KA implementation) that I'm in the process of finalising. I'll incorporate your review and put up a final version for further review by the end of the week. If you're interested, the PMBS submission will also include a compute bound benchmark written in Julia.

@tomdeakin and I had a discussion on the parameter passing and I think we've settled on the current approach being acceptable.