AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files
Other
1.52k stars 193 forks source link

Investigate whether we still need to compile with /source-charset:utf-8 /execution-charset:us-ascii on Windows #2345

Open wantehchang opened 1 month ago

wantehchang commented 1 month ago

@tongyuantongyu Yuan: I need your review of this issue.

You added the MSVC compiler flags /source-charset:utf-8 /execution-charset:us-ascii to libavif. I am wondering if we should still compile libavif with these flags. There are three reasons that make me reconsider this issue:

  1. While adding clang-cl compilation support, I found that clang-cl does not support /execution-charset:us-ascii.
  2. The vcpkg libavif package removes these flags.
  3. avifenc.exe and avifdec.exe are now configured to use the UTF-8 code pages.

It seems that the third reason alone should make it unnecessary to compile libavif with /execution-charset:us-ascii. I am not familiar with internationalization issues on Windows, so I would appreciate your expert opinion. Thanks!

tongyuantongyu commented 1 month ago

This flag was added to ensure we don't use non-ASCII characters in strings in the source code. Here are the two reasons:

So the flag is more like a canary, than to solve some Windows-specific issue.

wantehchang commented 1 month ago

Yuan: Thank you for the reply. Using non-ASCII characters in strings in the source code is very unlikely. The only reasons I can think of are:

All of these reasons are unlikely. So I will probably just remove these flags or try to improve the comments for them.

I am afraid that none of the libavif maintainers understand the purpose of these flags, so after the recent rework of the build system, the flags are only used when compiling the source files in the avif and avif_apps libraries. It is a maintenance burden.

wantehchang commented 1 month ago

Yuan: I did some experiments on my Windows laptop. I understand how the compiler options work now. I think they can be used as compiler warnings, although /execution-charset:us-ascii does convert non-ASCII characters to question marks. But the details are quite complicated, and I am not sure if I will still remember the details in a few months.

Surprisingly, it is possible to bypass /execution-charset:us-ascii by using octal or hex escape sequence, e.g.,

      "San Jos\xC3\xA9",  // San José

I also experienced the incorrect rendering of non-ASCII letters in Command Prompt unless I followed the procedure in a Stack Overflow article to "use Unicode UTF-8 for worldwide language support."

wantehchang commented 1 month ago

Here is an example of the compiler warning:

C:\src\libavif\src\avif.c(1065): warning C4566: character represented by universal-character-name '\u00E9' cannot be represented in the current code page (20127)

tongyuantongyu commented 1 month ago

I am afraid that none of the libavif maintainers understand the purpose of these flags

We may use the /utf-8 flag which aligns with other platforms, supported by and is the default of clang-cl, and is more intuitive.

Since libavif is using manifest to request UTF-8 Console, the only downside would be no longer able to ensure we never print non-ASCII character unless explicit user request. Based on your analysis this is quite unlikely to happen.