AoS vs SoA - Githubissues

IANW-Projects / ConservationLaws

A set of tools for numerically solving different kinds of conservation or balance laws with OpenCL.

3 stars 5 forks source link

AoS vs SoA #19

Closed ranocha closed 5 years ago

ranocha commented 5 years ago

~~This is work in progress and should not be merged now!~~ We want to compare "array of structures" and "structure of arrays", cf. #6. The new possibility is implemented using a #define and I_Tech('memory_layout') = 'USE_STRUCTURE_OF_ARRAYS' in matlab.

Up to now, we had an array of structures. The other possibility is implemented additionally and everything is encapsulated via get_field, set_field etc. Most computations on my hardware/software are fine, the only exception is the computation of norms if num_nodes != num_nodes_pad. Otherwise, the new memory layout causes additional errors.

Do you have any suggestions how to correct the computation of the norms, @philipheinisch @Kostaszki?

Here are some possibilities:

Remove num_nodes_pad and use only num_nodes == NODES_X * NODES_Y * NODES_Z.
Add additional checks in norm2, norm_infty etc.
Add NUM_NODES_PAD to the OpenCL part.

Possibilities 1. and 2. are similar and could impact the performance. Nevertheless, we don't compute norms etc. in performance critical parts up to now. I would expect that possibility 3. allows the best performance but increases the code complexity (a new constant has to be defined) a bit.

ranocha commented 5 years ago

I've implemented possibility 3. from above here. It seems to be fine and I get some speedups, depending on the hardware and the problem at hand:

Testcase	CPU	GPU
`induction_equation.m`	50 %	15 %
`ideal_MHD.m`	3 %	20 %

From my point of view, this PR is finished an can be merged; it closes #6.

ranocha commented 5 years ago

I've rebased on master to fix the merge conflicts.

ranocha commented 5 years ago

I've rebased on master to fix merge conflicts and adapted the ideal_gas_Euler parts. Here are the new speedups (EDIT: With the new commit mentioned below)

Testcase	CPU	GPU
`linear_constant_advection.m`	0 %	0 %
`linear_variable_advection.m`	40 %	20 %
`induction_equation.m`	50 %	15 %
`ideal_gas_Euler.m`, `USE_FLUX_KennedyGruber`	30 %	30 %
`ideal_gas_Euler.m`, `USE_FLUX_Chandrashekar`	2 %	15 %
`ideal_MHD.m`	3 %	20 %

I've encountered a strange problem that I don't understand currently: Running ideal_gas_Euler.m with USE_FLUX_Chandrashekar and USE_ARRAY_OF_STRUCTURES is okay and takes 76 seconds on my GPU. Using the CPU, it takes 460 seconds and the computation blows up (NaN). The same problem occurs for USE_STRUCTURE_OF_ARRAYS.

@Kostaszki: Could you please have a general look at this PR? ~~Do you have the same problem with USE_FLUX_Chandrashekar? Maybe we might ignore this problem at first, merge this PR, and investigate it later?~~

ranocha commented 5 years ago

I've fixed the failure with USE_FLUX_Chandrashekar on my Intel CPU/GPU. Besides, that reduced the runtime on the Nvidia GPU. The basic difference is that

  REAL F = (u <  (REAL)(1.0e-2)) * (1 + u * ((REAL)(1.0/3.0) + u * ((REAL)(1.0/5.0) + u * (REAL)(1.0/7.0))))
         + (u >= (REAL)(1.0e-2)) * (log(zeta) / (2*f));

is replaced with

  REAL F = (u <  (REAL)(1.0e-2)) ? (1 + u * ((REAL)(1.0/3.0) + u * ((REAL)(1.0/5.0) + u * (REAL)(1.0/7.0))))
                                 : (log(zeta) / (2*f));

philipheinisch commented 5 years ago

Works fine in general. Tested on GTX 1060, Ryzen 5 and Ryzen 7. AoS is consistenly slower. There is a small bug in initialize.m, num_nodes needs to be casted: group_size = 2^floor(log(double(num_nodes)) / log(2)); Additionally numerical results for AoS and SoA seem to be inconsistent for smaller N.

ranocha commented 5 years ago

Thanks, I've fixed initialize.m.

ranocha commented 5 years ago

The plot problems for small N should be fixed now. Thanks for testing, @philipheinisch!

ranocha commented 5 years ago

Do you approve these changes, @philipheinisch? Can we (squash-) merge this PR?