IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
97 stars 29 forks source link

Fixes `dpctl.tensor.round` on CUDA devices #1700

Closed ndgrigorian closed 1 month ago

ndgrigorian commented 1 month ago

When compiled for CUDA, std::rint would incorrectly round values halfway between two integers towards 0 (i.e., 1.5 -> 1.). The array API specification requires that these values be rounded to the nearest even integer instead.

To resolve this, std::rint has been replaced with sycl::rint, which does not rely on the current floating-point rounding mode (see SYCL specification).

As was pointed out at the time of implementation the floating-point rounding mode can vary between devices.

github-actions[bot] commented 1 month ago

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:

github-actions[bot] commented 1 month ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_29 ran successfully. Passed: 890 Failed: 11 Skipped: 91

coveralls commented 1 month ago

Coverage Status

coverage: 87.911%. remained the same when pulling d8705a2b781342c0cfa93e224aefa3404367262b on fix-round-for-nvidia into 1de00cba3ad0373678ae03f201e710c56b48a615 on master.

ndgrigorian commented 1 month ago

It might be a good idea to scan the code base for remaining uses of std namespace transcendental functions and replace those one by one too

I agree. I'll do this as a separate PR, but I think it's a good idea too.

oleksandr-pavlyk commented 1 month ago

I have opened gh-1701 for build break with LLVM SYCL compiler, it is unrelated to changes in this PR.