The PR here https://github.com/MFlowCode/MFC/pull/285 shows the following result for this benchmark case on Phoenix GPU:
viscous_weno5_sgb_mono 1.00x 0.45x 1.33x
indicating that the code in this PR is about twice as fast as the current master branch for this case.
The PR here https://github.com/MFlowCode/MFC/pull/285 shows the following result for this benchmark case on Phoenix GPU:
viscous_weno5_sgb_mono 1.00x 0.45x 1.33x
indicating that the code in this PR is about twice as fast as the current master branch for this case.Notably, this case is not faster on CPUs. What's going on, here?