nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

redsnic commented 1 year ago

Dear developers,

I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

For more details and to reproduce the issue, please refer to this StackOverflow question.

Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

Thank you in advance

kmaehashi commented 1 year ago

This sounds like an issue of NVRTC rather than CUDA Python. The issue was also reproducible in CuPy built without CUDA Python.

>>> import locale, cupy
>>> locale.getpreferredencoding()
'UTF-8'
>>> cupy.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'

Env: CUDA 11.8 / Ubuntu 20.04

redsnic commented 1 year ago

Good to know, I have filed a bug to Nvidia now, let's see. Thank you again.

redsnic commented 1 year ago

Just to give a small update. By discussing the issue with Nvidia we found out that it is possible to export LC_ALL="POSIX" as a workaround to avoid NVCC changing the encoding to ASCII.

The causes of the bug are still unknown and I will report when I have any other news.

vzhurba01 commented 1 year ago

Since this is a bug outside of CUDA Python, I'll close this issue.

Thanks for sharing that workaround. If there's a link you can share for where this bug is being tracked, I'm sure folks would appreciate it.

redsnic commented 1 year ago

Here is the link to the bug report: https://developer.nvidia.com/nvidia_bug/3833924

Thank you again for your help

Flamefire commented 3 months ago

Just to give a small update. By discussing the issue with Nvidia we found out that it is possible to export LC_ALL="POSIX" as a workaround to avoid NVCC changing the encoding to ASCII.

The causes of the bug are still unknown and I will report when I have any other news.

Any updates here? It is indeed a bug in NVRTC, specifically nvrtcCompileProgram and can be reproduced in C++:

#include <langinfo.h>
#include <cuda.h>
#include <nvrtc.h>
#include <vector>
#include <iostream>

int main(){
    setlocale(LC_ALL, "");
    std::string code = "";
    std::vector<const char*> args = {"--gpu-architecture=sm_80"};
    nvrtcProgram program;
    nvrtcCreateProgram(&program, code.c_str(), nullptr, 0, nullptr, nullptr);
    std::cout << nl_langinfo(CODESET) << '\n';
    nvrtcCompileProgram(program, args.size(), args.data());
    std::cout << nl_langinfo(CODESET) << '\n';
}

Compile with nvcc test.cu -lnvrtc and observe:

$ LC_ALL=en_US.UTF-8 ./a.out 
UTF-8
ANSI_X3.4-1968
$ LC_ALL=C.UTF-8 ./a.out 
UTF-8
ANSI_X3.4-1968
$ LC_ALL=POSIX ./a.out 
ANSI_X3.4-1968
ANSI_X3.4-1968

I'm a bit confused about the statement "export LC_ALL="POSIX" as a workaround to avoid NVCC changing the encoding to ASCII" as that by definition sets the encoding to ASCII in the first place. So the change is only masked.

NVIDIA / cuda-python

nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968 #29