dash-project / dash-apps

Example applications and tools for DASH
12 stars 3 forks source link

added NPB kernels: CG, EP, FT, IS, MG #13

Open Mietzsch opened 5 years ago

Mietzsch commented 5 years ago

This is an implementation of the five original NPB kernels using DASH. We use several aspects of DASH, including DASH algorithms, CSR patterns and async_copy.

devreal commented 4 years ago

After browsing through some of the code, I have another major concern: some codes allocate global memory in the critical path, e.g., every timestep. Global memory allocation is costly (orders of magnitude slower than malloc). Has anyone ever tested this on a real HPC system (ideally IB or Cray where global memory is pinned)? How does it compare against the MPI version of the NPB benchmarks? In many places these global data structures can probably be allocated once and reused, so the fix should be easy.

The reason I am concerned is this: if these benchmarks end up in the repo someone will eventually grab them and use them to compare their approach to DASH. They will not make an attempt to investigate why the performance of DASH seemingly sucks. We should be careful with putting out benchmarks where we cannot show that we are at least in the same ballpark as MPI. This would come back to haunt us...

Mietzsch commented 4 years ago

After browsing through some of the code, I have another major concern: some codes allocate global memory in the critical path, e.g., every timestep. Global memory allocation is costly (orders of magnitude slower than malloc). Has anyone ever tested this on a real HPC system (ideally IB or Cray where global memory is pinned)? How does it compare against the MPI version of the NPB benchmarks? In many places these global data structures can probably be allocated once and reused, so the fix should be easy.

The reason I am concerned is this: if these benchmarks end up in the repo someone will eventually grab them and use them to compare their approach to DASH. They will not make an attempt to investigate why the performance of DASH seemingly sucks. We should be careful with putting out benchmarks where we cannot show that we are at least in the same ballpark as MPI. This would come back to haunt us...

No, I did not test this on a real HPC system. Unfortunately, I'm working on different projects now and I don't have the time to test and work out the new global data-structures. If anybody wants to go ahead and do it, you're more than welcome.