NVIDIA / jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
BSD 3-Clause "New" or "Revised" License
518 stars 64 forks source link

Fix <limits> on Windows. #86

Closed Robadob closed 3 years ago

Robadob commented 3 years ago

This adds an explicit cast to the two literal values which building with <limits> on windows complains about.

Tested that it fixes the reported error in #85, by adding "#include <limits>\n" to test_simple of jitify_example.cpp. Haven't tested it on Linux, where presumably <limits> was already working.

Closes #85

benbarsdell commented 3 years ago

Thanks for this!

Can you check if changing these two lines: https://github.com/NVIDIA/jitify/blob/788039b/jitify.hpp#L1924-L1925 to USHRT_MIN and USHRT_MAX instead (so that they're unsigned instead of signed) fixes it?

According to this page, wchar_t on Windows is supposed to be unsigned, which I think is the root of the bug. If the constant values are correct then it shouldn't complain about narrowing conversion any more.

Robadob commented 3 years ago

Changing those has no visible impact to the original bug, get the same "conversion from int to wchar_t" error.

Even making them include a cast, made no difference. Suggesting they're not being pulled from there?

    "#define WCHAR_MIN   (wchar_t)USHRT_MIN\n"
    "#define WCHAR_MAX   (wchar_t)USHRT_MAX\n"
Robadob commented 3 years ago

It appears "#if defined _WIN32 || defined _WIN64\n" is not being used, so they're being taken from the #else case.

Robadob commented 3 years ago

If I instead change the else case image

The error changes to this.

---------------------------------------------------
--- JIT compile log for my_program ---
---------------------------------------------------
limits(124): error: identifier "USHRT_MIN" is undefined

1 error detected in the compilation of "my_program".

---------------------------------------------------
Robadob commented 3 years ago

I guess USHRT_MIN doesn't have a define as it should always be 0 for any unsigned vals.

Robadob commented 3 years ago

So this version works, however i've had to update the wrong branch of the #if. So there's a second issue with #if defined _WIN32 || defined _WIN64 not automatically triggering on a windows build.

Changing it to "#if defined(_WIN32) || defined(_WIN64)\n" (the format I'm more familiar with) doesn't help, so I guess Jitify/NVRTC isn't defining them internally?

image

Robadob commented 3 years ago

NVRTC is supposed to define _WIN64.

But evidently it doesn't. The below version, where I manually define _WIN64 inline does work, but obviously isn't a viable fix.

image

Robadob commented 3 years ago

Hmm, so _WIN64 is first documented as predefined in CUDA 11.0 (based on @ptheywood 's research).

Did notice, that despite building with CUDA 11.2 and providing CUDA 11.2 include path, I was linking against CUDA 11.0's nvrtc.

On updating to link against 11.2, so I could test that. Now getting an exception thrown by Jitify due to an NVRTC error NVRTC_ERROR_BUILTIN_OPERATION_FAILURE at jitify.hpp#L2836

Realised when I manually set the PATH variables for CUDA 11.2, had left in ; which was causing weird behaviours.

benbarsdell commented 3 years ago

Thanks for digging into this. Is _WIN64 defined for you when using CUDA 11.0?

We can probably replace the OS-check with a sizeof(wchar_t) check instead to avoid this problem.

Robadob commented 3 years ago

Ok starting from the beginning, using CUDA 11.2, and NVRTC 11.2.

The original version produces this error, same as the error reported in #85, except reported once rather than twice.

---------------------------------------------------
--- JIT compile log for my_program ---
---------------------------------------------------
limits(124): error: invalid narrowing conversion from "int" to "wchar_t": constant value does not fit in destination type

1 error detected in the compilation of "my_program".

---------------------------------------------------

Your proposed fix, over changing L1924-1925 to USHRT_MIN, USHRT_MAX also fails, as limits.h doesn't define unsigned min values (besides int?).

---------------------------------------------------
--- JIT compile log for my_program ---
---------------------------------------------------
limits(124): error: identifier "USHRT_MIN" is undefined

1 error detected in the compilation of "my_program".

---------------------------------------------------

Replacing USHRT_MIN with 0, does however work.

Unclear whether this also works with CUDA 11.0 (and my mix/match of cuda 11.0 and 11.2 was causing the predef macro to go AWOL). I'll update the PR commit and then try and test it with cuda 11.0.

Robadob commented 3 years ago

So the current version in this PR works for CUDA 11.2, but not 11.0. For some reason 11.0 does not predefine _WIN64 macro, despite it being documented.

I haven't got 11.1 installed, so can't test that.

Robadob commented 3 years ago

We can probably replace the OS-check with a sizeof(wchar_t) check instead to avoid this problem.

To the best of my understanding preprocessor can't evaluate sizeof(), similarly a quick test of #if sizeof(wchar_t)==2 was simply treated as false.

It could be easier to move the #if out of the string, so it's checked against the host compiler instead.

Robadob commented 3 years ago

The second commit applies this change, and works for me with both CUDA 11.0 and 11.2 (on Windows).

Robadob commented 3 years ago

Clarity on the 11.0.

Docs archive link for 11.0 docs is for 11.0.3 (has same link as 11.0 update 1). My installed 11.0 docs state nvrtc version 11.0,182, and do not include _WIN64 in the documented predefines.

benbarsdell commented 3 years ago

Using the host compiler macro looks like a good solution. I think it would be best to put this in the preinclude header here so that it fixes all usages:

// WAR to allow exceptions to be parsed
#define try
#define catch(...)
)"
#if defined(_WIN32) || defined(_WIN64)
// WAR for NVRTC <= 11.0 not defining _WIN64.
R"(
#ifndef _WIN64
#define _WIN64 1
#endif
)"
#endif
;

(I haven't tested this on Windows myself yet; let me know if you see anything wrong with it).

Robadob commented 3 years ago

A quick test shows that works as expected, fixing my CUDA 11.0.

Have reverted the previous fix and added your change instead.

Let me know if you want me to squash. all the changes into a single commit/remove the two commits that cancel etc.

benbarsdell commented 3 years ago

A squash or rebase to remove the canceled commits would be great, thanks. Apart from that this looks good.

Robadob commented 3 years ago

Have rebased, also re-ordered the 2 commits as they're technically separate issues but one is dependent on the other.