AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files
Other
1.53k stars 195 forks source link

Use /execution-charset:us-ascii only with /WX #2368

Open wantehchang opened 1 month ago

wantehchang commented 1 month ago

Use the MSVC /source-charset:utf-8 /execution-charset:us-ascii options only when warnings are treated as errors (/WX) so that we never substitute a question mark (?) for a non-ASCII character in string or character literals.

Update the comments for /source-charset:utf-8 /execution-charset:us-ascii basec on improved understranding of these options.

Related to https://github.com/AOMediaCodec/libavif/issues/2345.

wantehchang commented 1 month ago

@tongyuantongyu Yuan: Please do a preliminary review of this pull request. Because of the substitution of a question mark (?) for a non-ASCII character in string or character literals, my first thought is that these options should only be used when warnings are treated as errors. So that's what this pull request does.

You also made a good point that for compatibility with clang-cl and with non-Windows platforms, we should use the /utf-8 option instead. We can do that next, but it's good to give this a try. However, if you or my coworkers don't understand my comment in this pull request, then I will switch to the /utf-8 option directly.

tongyuantongyu commented 1 month ago

I prefer to always add /source-charset:utf-8 whether or not /WX is used. /source-charset:utf-8 is more to "tell MSVC to treat source code in UTF-8" than "validate that the source code contains only characters that can be represented in UTF-8", otherwise MSVC will default to use ANSI code page, decoding source code wrongly, and emit C4819 when a decode error is encounted.

My intention to add /source-charset:utf-8 was to make MSVC decode source code correctly. My intention to add /execution-charset:us-ascii was to ensure we don't force user to deal with i18n and character encodings when using CLI. If you agree with my intention, let's add these to the comments; or we need to recheck whether we really want the two flags.

Your new comments explained many possible misunderstandings a user or new maintainer may have. Can you move the explanations to a new paragraph, so the first paragraph explains what we want to do and what the flags do, and the second paragraph explains more details (like "check the strings you added" when seeing C4566) and resolves misunderstandings (like it won't interfere escape sequences and won't prevent printing user provided UTF-8 strings)?

wantehchang commented 1 month ago

Yuan: Thank you very much for your comment. I will try to revise the comments based on your suggestions. However, I am now wondering if we should just switch to /utf-8 or simply delete these options.

To fully understand these options, I had to read three pages of Microsoft documentation (for /source-charset, /execution-charset, and /validate-charset), and it is necessary to understand four terms used in the C standard (the source charset, the basic source charset, the extended source charset, and the execution charset) and the phases of compilation referred to in the definitions. I also had to resolve ambiguity in the documentation by experiments.

I think it is the /execution-charset:us-ascii option that raises concerns and questions about UTF-8 support. On the other hand, the /utf-8 option is unlikely to raise questions, so we may not even need to document it. (People want UTF-8 support, so they are unlikely to wonder what the /utf-8 option is for.)

I will look into why the vcpkg libavif package removed the /source-charset:utf-8 /execution-charset:us-ascii options.