ExtremeFLOW / neko

/ᐠ. 。.ᐟ\ᵐᵉᵒʷˎˊ˗
https://neko.cfd/
Other
158 stars 27 forks source link

Add vector version for device ax_helm #1293

Closed njansson closed 1 month ago

njansson commented 1 month ago

The combined ax_helm performs ~20% faster than issuing 3x standard ax_helm kernels (on an Nvidia A100)

njansson commented 1 month ago

TGV fails to converge.

Some infinity shows up for the start residual and pressure is also at a massive scale starting the second iteration.

Details

Do you get any other error messages? The kernel uses quite some shared memory.

Also, it might be something with the padded version. (lx 4, 8, 16)

timfelle commented 1 month ago

TGV fails to converge. Some infinity shows up for the start residual and pressure is also at a massive scale starting the second iteration. Details

Do you get any other error messages? The kernel uses quite some shared memory.

Also, it might be something with the padded version. (lx 4, 8, 16)

Nope, the crash is triggered on a floating point exception not a memory issue.

njansson commented 1 month ago

TGV fails to converge. Some infinity shows up for the start residual and pressure is also at a massive scale starting the second iteration. Details

Do you get any other error messages? The kernel uses quite some shared memory. Also, it might be something with the padded version. (lx 4, 8, 16)

Nope, the crash is triggered on a floating point exception not a memory issue.

Sure, the issue is a in the padded version (indexing issues)

njansson commented 1 month ago

TGV fails to converge. Some infinity shows up for the start residual and pressure is also at a massive scale starting the second iteration. Details

Do you get any other error messages? The kernel uses quite some shared memory. Also, it might be something with the padded version. (lx 4, 8, 16)

Nope, the crash is triggered on a floating point exception not a memory issue.

Sure, the issue is a in the padded version (indexing issues)

Should be fixed now