nR3D commented 6 months ago

Changes

DeviceReal can now be set to double
- This change required just some type casting to make sure that the right types could be inferred at compile-time.
As discussed #448, some helper functions regarding vectors have been added to generalize kernel executions
- I have not included a few more changes that would be needed to let DeviceVecd = Vecd (i.e. using Eigen matrices as device vectors), since Eigen classes are not device-copyable, hence they cannot be run within kernels without other significant changes to how vectors are stored. I have added the changes needed to enable DeviceVecd = Vecd to https://github.com/nR3D/SPHinXsys/tree/sycl-eigen in case they would ever be needed. They mainly involve the definition of other helper classes for vectors, and changes to the declaration of some templates.

Benchmarks

With 300'000 particles, using double-precision makes the simulation 86.4% slower than with single-precision

Double:

Total wall time for computation: 1114.259535746 seconds.
interval_computing_time_step =768.747579194
interval_computing_fluid_pressure_relaxation = 325.318849561
interval_updating_configuration = 3.267941609
interval_writing_files = 10.581363697

Float:

Total wall time for computation: 597.805587192 seconds.
interval_computing_time_step =319.801477432
interval_computing_fluid_pressure_relaxation = 259.038648931
interval_updating_configuration = 2.944420347
interval_writing_files = 10.566627757

DrChiZhang commented 6 months ago

Well done. I will go through your modification.

Xiangyu-Hu commented 6 months ago

@junwei-jiang Could you also have some comments?

Xiangyu-Hu commented 6 months ago

the main slowdown is from interval_computing_time_step =768.747579194 ?

junwei-jiang commented 6 months ago

I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

nR3D commented 6 months ago

@Xiangyu-Hu the main slowdown is from interval_computing_time_step =768.747579194 ?

Since timestep computation and configuration updating are executed concurrently, that slowdown is probably determined by both. In general floating point numbers are used by all steps (aside from writing, where single/double precision is not relevant) so I would expect a similar slowdown for all methods. I am now running the same computation again, separating timestep and configuration computations, so we can better distinguish the slowdown in different intervals. I will post the results as soon as they are ready.

@junwei-jiang I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

Regression tests are still passing with floats, so double precision is not needed. But it is worth to have the possibility to enable it whenever an higher precision is required. By default, device and host precision is set to be the same, so SPHINXSYS_USE_FLOAT may be toggled to use single precision for both.

Xiangyu-Hu commented 6 months ago

@Xiangyu-Hu the main slowdown is from interval_computing_time_step =768.747579194 ?

Since timestep computation and configuration updating are executed concurrently, that slowdown is probably determined by both. In general floating point numbers are used by all steps (aside from writing, where single/double precision is not relevant) so I would expect a similar slowdown for all methods. I am now running the same computation again, separating timestep and configuration computations, so we can better distinguish the slowdown in different intervals. I will post the results as soon as they are ready.

@junwei-jiang I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

Regression tests are still passing with floats, so double precision is not needed. But it is worth to have the possibility to enable it whenever an higher precision is required. By default, device and host precision is set to be the same, so SPHINXSYS_USE_FLOAT may be toggled to use single precision for both.

I would not worried about using float as the lead term to determine accuracy is numerical discretization algorithm other than double or single precision.

nR3D commented 6 months ago

Double:

Total wall time for computation: 2269.544049414 seconds.
interval_computing_time_step =46.734129965
interval_computing_fluid_pressure_relaxation = 653.920405645
interval_updating_configuration = 1458.222392889
interval_writing_files = 108.284276538

Float:

Total wall time for computation: 1376.905903631 seconds.
interval_computing_time_step =40.042162188
interval_computing_fluid_pressure_relaxation = 580.866289297
interval_updating_configuration = 610.196198424
interval_writing_files = 144.182650995

Benchmarks are still using 300'000 particles, but this time the server I am using to benchmark is partially busy so runtimes are slower than before, therefore I run both cases again to have a proper comparison. Overall, configuration updating is where the execution slows down the most. I could run some profiling to find hotspots within configuration kernels. In particular, floating-point precision is relevant inside smoothing kernel calculations for neighbors, where different SYCL math and geometric functions are called, I suppose those are the reason for such slowdown.

Xiangyu-Hu / SPHinXsys

[SYCL] Enable double precision and generalized vector operations #462

Changes

Benchmarks