IANW-Projects / ConservationLaws

A set of tools for numerically solving different kinds of conservation or balance laws with OpenCL.
3 stars 5 forks source link

AoS vs SoA #19

Closed ranocha closed 5 years ago

ranocha commented 5 years ago

This is work in progress and should not be merged now! We want to compare "array of structures" and "structure of arrays", cf. #6. The new possibility is implemented using a #define and I_Tech('memory_layout') = 'USE_STRUCTURE_OF_ARRAYS' in matlab.

Up to now, we had an array of structures. The other possibility is implemented additionally and everything is encapsulated via get_field, set_field etc. Most computations on my hardware/software are fine, the only exception is the computation of norms if num_nodes != num_nodes_pad. Otherwise, the new memory layout causes additional errors.

Do you have any suggestions how to correct the computation of the norms, @philipheinisch @Kostaszki?

Here are some possibilities:

  1. Remove num_nodes_pad and use only num_nodes == NODES_X * NODES_Y * NODES_Z.
  2. Add additional checks in norm2, norm_infty etc.
  3. Add NUM_NODES_PAD to the OpenCL part.

Possibilities 1. and 2. are similar and could impact the performance. Nevertheless, we don't compute norms etc. in performance critical parts up to now. I would expect that possibility 3. allows the best performance but increases the code complexity (a new constant has to be defined) a bit.

ranocha commented 5 years ago

I've implemented possibility 3. from above here. It seems to be fine and I get some speedups, depending on the hardware and the problem at hand:

Testcase CPU GPU
induction_equation.m 50 % 15 %
ideal_MHD.m 3 % 20 %

From my point of view, this PR is finished an can be merged; it closes #6.

ranocha commented 5 years ago

I've rebased on master to fix the merge conflicts.

ranocha commented 5 years ago

I've rebased on master to fix merge conflicts and adapted the ideal_gas_Euler parts. Here are the new speedups (EDIT: With the new commit mentioned below)

Testcase CPU GPU
linear_constant_advection.m 0 % 0 %
linear_variable_advection.m 40 % 20 %
induction_equation.m 50 % 15 %
ideal_gas_Euler.m, USE_FLUX_KennedyGruber 30 % 30 %
ideal_gas_Euler.m, USE_FLUX_Chandrashekar 2 % 15 %
ideal_MHD.m 3 % 20 %

I've encountered a strange problem that I don't understand currently: Running ideal_gas_Euler.m with USE_FLUX_Chandrashekar and USE_ARRAY_OF_STRUCTURES is okay and takes 76 seconds on my GPU. Using the CPU, it takes 460 seconds and the computation blows up (NaN). The same problem occurs for USE_STRUCTURE_OF_ARRAYS.

@Kostaszki: Could you please have a general look at this PR? Do you have the same problem with USE_FLUX_Chandrashekar? Maybe we might ignore this problem at first, merge this PR, and investigate it later?

ranocha commented 5 years ago

I've fixed the failure with USE_FLUX_Chandrashekar on my Intel CPU/GPU. Besides, that reduced the runtime on the Nvidia GPU. The basic difference is that

  REAL F = (u <  (REAL)(1.0e-2)) * (1 + u * ((REAL)(1.0/3.0) + u * ((REAL)(1.0/5.0) + u * (REAL)(1.0/7.0))))
         + (u >= (REAL)(1.0e-2)) * (log(zeta) / (2*f));

is replaced with

  REAL F = (u <  (REAL)(1.0e-2)) ? (1 + u * ((REAL)(1.0/3.0) + u * ((REAL)(1.0/5.0) + u * (REAL)(1.0/7.0))))
                                 : (log(zeta) / (2*f));
philipheinisch commented 5 years ago

Works fine in general. Tested on GTX 1060, Ryzen 5 and Ryzen 7. AoS is consistenly slower. There is a small bug in initialize.m, num_nodes needs to be casted: group_size = 2^floor(log(double(num_nodes)) / log(2)); Additionally numerical results for AoS and SoA seem to be inconsistent for smaller N.

ranocha commented 5 years ago

Thanks, I've fixed initialize.m.

ranocha commented 5 years ago

The plot problems for small N should be fixed now. Thanks for testing, @philipheinisch!

ranocha commented 5 years ago

Do you approve these changes, @philipheinisch? Can we (squash-) merge this PR?