TEAR-ERC / tandem

A HPC DG method for 2D and 3D SEAS problems
BSD 3-Clause "New" or "Revised" License
16 stars 9 forks source link

Multigrid on GPU yields different results to CPU #52

Open hpc4geo opened 9 months ago

hpc4geo commented 9 months ago

Issue #50 identified some unexpected behavior when comparing CPU results with GPU results. The convergence history is different when the same PETSc option set is provided for an MG configuration.

Attached are the logs Thomas generated. tandem_GPU.log tandem_CPU.log

The things most likely causing differences in the residual history are probably associated with the ILU and LU solvers. Suggest confirming this by re-running CPU and GPU variants with the following additional options (placed at the end of any existing options).

-ksp_max_it 10
-mg_levels_pc_type jacobi
-mg_coarse_mg_coarse_ksp_type cg
-mg_coarse_mg_coarse_ksp_rtol 1.0e-2
-mg_coarse_mg_coarse_ksp_max_it 100
-mg_coarse_mg_coarse_ksp_type jacobi
Thomas-Ulrich commented 5 days ago

I've asked the LLM Qwen2, and got some hints to track down the problems, which might make sense:

The differences in the convergence history between the CPU and GPU runs could be attributed to several factors, particularly those related to the linear solvers and preconditioners used in your PETSc configuration. Here are some potential causes and suggestions to troubleshoot and potentially resolve the issue:

  1. Numerical Precision: GPU computations often use lower precision (e.g., single precision) compared to the double precision typically used on CPUs. This can lead to differences in the numerical results. Ensure that both the CPU and GPU versions are using the same precision.
  2. Different Implementations of Linear Solvers and Preconditioners: The GPU log shows that the AIJCUSPARSE matrix type is used, which is specific to CUDA and might have different implementations of solvers and preconditioners compared to the CPU versions. The AIJ matrix type is used on the CPU, which might have different behavior.
  3. ILU and LU Preconditioners: The ILU and LU preconditioners can be sensitive to the order of operations and numerical precision. The GPU implementation might be using a different algorithm or precision that leads to different results.
  4. Eigenvalue Estimation: The GPU log mentions Chebyshev smoothing, which requires eigenvalue estimates. The estimation process might differ between CPU and GPU, leading to different smoothing effects.
  5. Block Jacobi Preconditioner: The block Jacobi preconditioner uses different block sizes on CPU and GPU (12 vs. 30). This can lead to different preconditioning effects.
  6. GAMG Preconditioner: The GAMG preconditioner has different behavior on CPU and GPU. The GPU version uses a different number of levels (4 vs. 5 on CPU) and might have different threshold settings for dropping small values in the graph.

To address these issues, consider the following steps:

  1. Check Precision: Ensure that both CPU and GPU runs are using the same precision. You can force double precision on the GPU by setting the appropriate PETSc options.
  2. Consistent Preconditioners: Try to use preconditioners that have consistent implementations across CPU and GPU. For example, use the AIJ matrix type on both CPU and GPU with the appropriate PETSc options.
  3. Eigenvalue Estimation: Investigate if the eigenvalue estimation process can be made consistent between CPU and GPU. This might involve using the same estimation method or adjusting the Chebyshev smoothing parameters. Block Size: Use the same block size for the block Jacobi preconditioner on both CPU and GPU.
  4. GAMG Settings: Adjust the GAMG settings to be consistent between CPU and GPU. This might involve setting the same number of levels and threshold values.
  5. Convergence Criteria: Ensure that the convergence criteria are the same for both CPU and GPU runs. This includes the relative and absolute tolerances.
  6. Profiling and Debugging: Use profiling tools to compare the performance and numerical results of the solvers and preconditioners on CPU and GPU. Debugging tools can also help identify where the differences occur.
  7. Documentation and Community: Consult the PETSc documentation and community forums for information on known differences between CPU and GPU implementations. There might be specific recommendations or workarounds for your use case.

By systematically addressing these potential causes, you should be able to identify and mitigate the differences in the convergence history between the CPU and GPU runs.