You can see the detailed description of the issue and and screenshots in the issue #3345. In short, the current implementation displays a wrong, unreadable text "WebAuthn ã�®èª�証"" where it should display "Web Authn の認証" when the user locale is Japanese.
I have actually tried reproducing the issue only in Japanese locale. In addition, theoretically it can happen in any locales whose localized text of Fido2AuthenticateWebAuthn message contains multibyte characters. You should be also able to reproduce the issue in an arbitrary locale by inserting a Unicode emoji into the text.
Analysis of the issue in the original implementation
The issue in the following code in the original implementation is that it applies unnecessary encoding to multibyte characters before the essential part, and there is no corresponding decoding in the end-consumer of the return value.
public static string EncodeDataParameter(object obj)
{
string EncodeMultibyte(Match match)
{
return Convert.ToChar(Convert.ToUInt32($"0x{match.Groups[1].Value}", 16)).ToString();
}
var escaped = Uri.EscapeDataString(JsonConvert.SerializeObject(obj));
var multiByteEscaped = Regex.Replace(escaped, "%([0-9A-F]{2})", EncodeMultibyte);
return WebUtility.UrlEncode(Convert.ToBase64String(Encoding.UTF8.GetBytes(multiByteEscaped)));
}
The callers of EncodeDataParameter, Bit.App.Pages.CaptchaProtectedViewModel.HandleCaptchaAsync and Bit.App.Pages.TwoFactorPageViewModel.Fido2AuthenticateAsync place some localized UI texts into the parameter obj. Therefore, the implementation of EncodeDataParameter needs to safely convert multibyte characters in the message into url-safe forms, and the conversion must be consistent with the reverse conversion in the consumers.
For example, obj has a field btnText whose value is "WebAuthn の認証" when TwoFactorPageViewModel.Fido2AuthenticateAsync calls the method in Japanese locale.
The value of btnText corresponds to the Fido2AuthenticateWebAuthn entry in the resource file.
What is actually happening behind the scene
Let's take a closer look at the first multibyte character in the btnText value, for example. The multibyte character is "の" (U+306E).
Encoding into JSON preserves the character because ECMA-404 does not require character escapes in strings values except for " (double quotation) or \ (backslash).
Uri.EscapeDataString internally calls System.UriHelper.EscapeStringToBuilder and it encodes the character into its UTF-8 byte sequence representation, and then individual bytes in the sequence are encoded into the form of "%xx".
The character U+306E is represented as 3 bytes of 0xE3, 0x81, 0xAE in UTF-8
The byte sequence is then encoded into "%E3%81%AE"
The invocation of Regex.Replace replaces the individual occurrences of "%xx" form by calling EncodeMultibyte
Convert.ToUInt32 in EncodeMultibyte converts the regex captures of "E3", "81", and "AE" into 0xE3u, 0x81u, and 0xAEu, respectively
Convert.ToChar converts individual integer values into their corresponding unicode code points.
0xE3u, 0x81u, and 0xAEu are converted into the following characters, respectively:
ã (U+00E3, LATIN SMALL LETTER A WITH TILDE)
a non-printable control character (U+0081)
® (U+00AE, REGISTERED SIGN)
Encoding.UTF8.GetBytes returns the UTF-8 byte sequence representation of the given string.
The characters ã, , and ® are converted into their corresponding byte sequences as the followings:
ã (U+00E3) --> 0xC3, 0xA3
U+0081 --> 0xC2, 0x81
® (U+00AE) --> 0xC2", 0xAE
Convert.ToBase64String converts the 6 bytes into "w6PCgcKu".
Just WebUtility.UrlEncode would be sufficient to keep the return value url-safe. It internally converts the given string into its UTF-8 byte sequence representation, and then convert the individual bytes into url-safe characters.
In practice, Base64 encoding is also necessary for consistency with the decoding in the end-consumer of the data.
Anyway, the earlier steps of (2), and (3) are not necessary in terms of the url-safety. Both of WebUtility.UrlEncode and Convert.ToBase64String(Encoding.UTF8.GetBytes(str)) are safe to multibyte characters. Actually, just (1), (4), (5) are what the consumer-side implementation reverses.
The steps (2) and (3) actually just make the encoded value incompatible to the implementation of the decoding.
This is the reason why this PR removes the extra steps (2) and (3).
Comparison with ASCII characters.
You might wonder why this issue did not happen in earlier characters in the btnText string or any other messages in other many locales, e.g. in US English.
It is because the byte sequence representations in UTF-8 are equal to the Unicode codepoint values themselves for ASCII characters. Therefore, the combination of the extra steps (2) and (3) is effectively no-op for the characters.
For example, the character 'W' (U+0057) is represented as a 1-length sequence of bytes 0x57. The step (2) converts it into a string "%57". The EncodeMultibyte in the step (3) converts the regex capture of "57" into an integer 0x57u, and then Convert.ToChar converts it into a char value 'W' (U+0057).
The issue happens only when the strings in the target JSON object contains characters whose UTF-8 byte sequence is different from the Unicode codepoint itself.
Before you submit
Please check for formatting errors (dotnet format --verify-no-changes) (required)
Please add unit tests where it makes sense to do so (encouraged but not required)
If this change requires a documentation update - notify the documentation team
If this change has particular deployment requirements - notify the DevOps team
Fixes #3345
Make the implementation of the character conversion consistent with their consumers in the web app.
Type of change
Objective
This PR fixes a mojibake issue in the FIDO2 Web Authn.
You can see the detailed description of the issue and and screenshots in the issue #3345. In short, the current implementation displays a wrong, unreadable text "WebAuthn ã�®èª�証"" where it should display "Web Authn の認証" when the user locale is Japanese.
You can also find a minimal reproducible example of the issue at https://github.com/bitwarden/mobile/issues/3345#issuecomment-2212340286.
I have actually tried reproducing the issue only in Japanese locale. In addition, theoretically it can happen in any locales whose localized text of
Fido2AuthenticateWebAuthn
message contains multibyte characters. You should be also able to reproduce the issue in an arbitrary locale by inserting a Unicode emoji into the text.Code changes
This PR removes the extra character conversion steps from
Bit.App.Utilities.AppHelpers::EncodeDataParameter
. This change makes the data encoding in the method consistent with the decoding atapps/web/src/connectors/captcha.ts#L50
andapps/web/src/connectors/webauthn.ts#L88
.Analysis of the issue in the original implementation
The issue in the following code in the original implementation is that it applies unnecessary encoding to multibyte characters before the essential part, and there is no corresponding decoding in the end-consumer of the return value.
Design of the current implementation
The intention of this original implementation is to convert the given set of query parameters for web authenticators (web apps) into a url-safe string. The generated strings are passed into the query parameters of the web authenticators (https://vault.bitwarden.com/webauthn-mobile-connector.html) or (https://vault.bitwarden.com/captcha-mobile-connector.html).
The callers of
EncodeDataParameter
,Bit.App.Pages.CaptchaProtectedViewModel.HandleCaptchaAsync
andBit.App.Pages.TwoFactorPageViewModel.Fido2AuthenticateAsync
place some localized UI texts into the parameterobj
. Therefore, the implementation ofEncodeDataParameter
needs to safely convert multibyte characters in the message into url-safe forms, and the conversion must be consistent with the reverse conversion in the consumers.For example,
obj
has a fieldbtnText
whose value is "WebAuthn の認証" whenTwoFactorPageViewModel.Fido2AuthenticateAsync
calls the method in Japanese locale. The value ofbtnText
corresponds to theFido2AuthenticateWebAuthn
entry in the resource file.What is actually happening behind the scene
Let's take a closer look at the first multibyte character in the
btnText
value, for example. The multibyte character is "の" (U+306E)."
(double quotation) or\
(backslash).Uri.EscapeDataString
internally callsSystem.UriHelper.EscapeStringToBuilder
and it encodes the character into its UTF-8 byte sequence representation, and then individual bytes in the sequence are encoded into the form of "%xx".0xE3, 0x81, 0xAE
in UTF-8Regex.Replace
replaces the individual occurrences of "%xx" form by callingEncodeMultibyte
Convert.ToUInt32
inEncodeMultibyte
converts the regex captures of "E3", "81", and "AE" into0xE3u
,0x81u
, and0xAEu
, respectivelyConvert.ToChar
converts individual integer values into their corresponding unicode code points.0xE3u
,0x81u
, and0xAEu
are converted into the following characters, respectively:ã
(U+00E3, LATIN SMALL LETTER A WITH TILDE)®
(U+00AE, REGISTERED SIGN)Encoding.UTF8.GetBytes
returns the UTF-8 byte sequence representation of the given string.ã
,®
are converted into their corresponding byte sequences as the followings:ã
(U+00E3) --> 0xC3, 0xA3®
(U+00AE) --> 0xC2", 0xAEConvert.ToBase64String
converts the 6 bytes into "w6PCgcKu".The consumer-side implementation in
apps/web/src/connectors/captcha.ts#L50
or apps/web/src/connectors/webauthn.ts#L88 reverses just the step (5), (4), and (1). Therefore, we getã®
(U+00E3, U+0031, U+00AE) instead of "の" (U+306E).Justification of the proposed fix
Just
WebUtility.UrlEncode
would be sufficient to keep the return value url-safe. It internally converts the given string into its UTF-8 byte sequence representation, and then convert the individual bytes into url-safe characters.In practice, Base64 encoding is also necessary for consistency with the decoding in the end-consumer of the data.
Anyway, the earlier steps of (2), and (3) are not necessary in terms of the url-safety. Both of
WebUtility.UrlEncode
andConvert.ToBase64String(Encoding.UTF8.GetBytes(str))
are safe to multibyte characters. Actually, just (1), (4), (5) are what the consumer-side implementation reverses.The steps (2) and (3) actually just make the encoded value incompatible to the implementation of the decoding. This is the reason why this PR removes the extra steps (2) and (3).
Comparison with ASCII characters.
You might wonder why this issue did not happen in earlier characters in the
btnText
string or any other messages in other many locales, e.g. in US English.It is because the byte sequence representations in UTF-8 are equal to the Unicode codepoint values themselves for ASCII characters. Therefore, the combination of the extra steps (2) and (3) is effectively no-op for the characters.
For example, the character 'W' (U+0057) is represented as a 1-length sequence of bytes
0x57
. The step (2) converts it into a string "%57". TheEncodeMultibyte
in the step (3) converts the regex capture of "57" into an integer0x57u
, and thenConvert.ToChar
converts it into a char value 'W' (U+0057).The issue happens only when the strings in the target JSON object contains characters whose UTF-8 byte sequence is different from the Unicode codepoint itself.
Before you submit
dotnet format --verify-no-changes
) (required)