Xiangyu-Hu / SPHinXsys

SPHinXsys provides C++ APIs for engineering simulation and optimization. It aims at complex systems driven by fluid, structure, multi-body dynamics and beyond. The multi-physics library is based on a unique and unified computational framework by which strong coupling has been achieved for all involved physics.
https://www.sphinxsys.org/
Apache License 2.0
259 stars 199 forks source link

[SYCL] Enable double precision and generalized vector operations #462

Closed nR3D closed 6 months ago

nR3D commented 6 months ago

Changes

Benchmarks

With 300'000 particles, using double-precision makes the simulation 86.4% slower than with single-precision

Double:

Total wall time for computation: 1114.259535746 seconds.
interval_computing_time_step =768.747579194
interval_computing_fluid_pressure_relaxation = 325.318849561
interval_updating_configuration = 3.267941609
interval_writing_files = 10.581363697

Float:

Total wall time for computation: 597.805587192 seconds.
interval_computing_time_step =319.801477432
interval_computing_fluid_pressure_relaxation = 259.038648931
interval_updating_configuration = 2.944420347
interval_writing_files = 10.566627757
DrChiZhang commented 6 months ago

Well done. I will go through your modification.

Xiangyu-Hu commented 6 months ago

@junwei-jiang Could you also have some comments?

Xiangyu-Hu commented 6 months ago

the main slowdown is from interval_computing_time_step =768.747579194 ?

junwei-jiang commented 6 months ago

I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

nR3D commented 6 months ago

@Xiangyu-Hu the main slowdown is from interval_computing_time_step =768.747579194 ?

Since timestep computation and configuration updating are executed concurrently, that slowdown is probably determined by both. In general floating point numbers are used by all steps (aside from writing, where single/double precision is not relevant) so I would expect a similar slowdown for all methods. I am now running the same computation again, separating timestep and configuration computations, so we can better distinguish the slowdown in different intervals. I will post the results as soon as they are ready.

@junwei-jiang I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

Regression tests are still passing with floats, so double precision is not needed. But it is worth to have the possibility to enable it whenever an higher precision is required. By default, device and host precision is set to be the same, so SPHINXSYS_USE_FLOAT may be toggled to use single precision for both.

Xiangyu-Hu commented 6 months ago

@Xiangyu-Hu the main slowdown is from interval_computing_time_step =768.747579194 ?

Since timestep computation and configuration updating are executed concurrently, that slowdown is probably determined by both. In general floating point numbers are used by all steps (aside from writing, where single/double precision is not relevant) so I would expect a similar slowdown for all methods. I am now running the same computation again, separating timestep and configuration computations, so we can better distinguish the slowdown in different intervals. I will post the results as soon as they are ready.

@junwei-jiang I'm curious about the precision difference. I mean, how much precision is lost when changing from double to float, and whether this is acceptable. I think the computational cost of double may be too high.

Regression tests are still passing with floats, so double precision is not needed. But it is worth to have the possibility to enable it whenever an higher precision is required. By default, device and host precision is set to be the same, so SPHINXSYS_USE_FLOAT may be toggled to use single precision for both.

I would not worried about using float as the lead term to determine accuracy is numerical discretization algorithm other than double or single precision.

nR3D commented 6 months ago

Double:

Total wall time for computation: 2269.544049414 seconds.
interval_computing_time_step =46.734129965
interval_computing_fluid_pressure_relaxation = 653.920405645
interval_updating_configuration = 1458.222392889
interval_writing_files = 108.284276538

Float:

Total wall time for computation: 1376.905903631 seconds.
interval_computing_time_step =40.042162188
interval_computing_fluid_pressure_relaxation = 580.866289297
interval_updating_configuration = 610.196198424
interval_writing_files = 144.182650995

Benchmarks are still using 300'000 particles, but this time the server I am using to benchmark is partially busy so runtimes are slower than before, therefore I run both cases again to have a proper comparison. Overall, configuration updating is where the execution slows down the most. I could run some profiling to find hotspots within configuration kernels. In particular, floating-point precision is relevant inside smoothing kernel calculations for neighbors, where different SYCL math and geometric functions are called, I suppose those are the reason for such slowdown.