dotnet / pinvoke

A library containing all P/Invoke code so you don't have to import it every time. Maintained and updated to support the latest Windows OS.
MIT License
2.12k stars 222 forks source link

RegisterClass/Ex missing CharSet.Unicode—why not set uniformly? #373

Closed jnm2 closed 6 years ago

jnm2 commented 6 years ago

RegisterClassEx uses Ansi and CreateWindowEx uses Unicode. This does not make sense and it results in "Class name not found."

https://github.com/AArnott/pinvoke/blob/1a55ba3556300996bf9bb34d7782bd0a4e010610/src/User32/User32.cs#L1306-L1311

Wouldn't it make more sense to remove CharSet from every definition in all libraries and use [module: DefaultCharSet(CharSet.Unicode)] in each instead?

AArnott commented 6 years ago

I'm confused by your statement that

RegisterClassEx uses Ansi

I don't see where it does. MSDN says it works either way, and the code snippet you included doesn't show us using Ansi. It looks like we don't explicitly state which one to use, in which case it will always use Unicode (as I understand it) unless it's running on Windows XP, IIRC, which we don't even claim to support.

jnm2 commented 6 years ago

DllImport defaults to None which ends up effectively Ansi if you don't specify, IIRC. I know that's why I've had to use DefaultCharSetAttribute.

https://msdn.microsoft.com/library/system.runtime.interopservices.dllimportattribute.charset.aspx#Anchor_1:

The default enumeration member for C# and Visual Basic is CharSet.Ansi and the default enumeration member for C++ is CharSet.None, which is equivalent to CharSet.Ansi. In Visual Basic, you use the Declare statement to specify the CharSet field.

https://docs.microsoft.com/dotnet/framework/interop/specifying-a-character-set#string-marshaling-and-name-matching

The CharSet field accepts the following values:

CharSet.Ansi (default value)

Either way, this fails with CreateWindowEx using class name:

[DllImport(nameof(User32), SetLastError = true)] 
[return: MarshalAs(UnmanagedType.U2)] 
public static extern short RegisterClassEx(ref WNDCLASSEX lpwcx); 

And this succeeds:

[DllImport(nameof(User32), SetLastError = true, CharSet = CharSet.Unicode)] 
[return: MarshalAs(UnmanagedType.U2)] 
public static extern short RegisterClassEx(ref WNDCLASSEX lpwcx); 

As to why, my guess is it had to do with Win98 and ME compatibility back in 2002 for .NET 1.0 and they just never wanted to bite the bullet and cause a breaking change in behavior since then.

jnm2 commented 6 years ago

This makes it clear:

unsafe void Main()
{
    fixed (char* pointerToUnicode = "test")
    {
        MessageBox(IntPtr.Zero, (IntPtr)pointerToUnicode, (IntPtr)pointerToUnicode, 0);
    }
}

[DllImport("user32.dll")]
public static extern IntPtr MessageBox(IntPtr hWnd, IntPtr text, IntPtr caption, uint type);

Shows:

---------------------------
t
---------------------------
t
---------------------------
OK   
---------------------------

The unicode bytes { 0x72, 0x0, 0x65, 0x0, 0x71, 0x0, 0x72, 0x0, 0x0, 0x0 } are being interpreted as ansi and it terminates at the second byte. This means MessageBoxA is being called, not MessageBoxW.

jnm2 commented 6 years ago

Inverse example. Even though MessageBoxW is called, the marshaler clearly thinks it needs to convert the strings to ansi before passing them:

unsafe void Main()
{
    MessageBox(IntPtr.Zero, "test", "test", 0);
}

[DllImport("user32.dll", EntryPoint = "MessageBoxW")]
public static extern IntPtr MessageBox(IntPtr hWnd, string text, string caption, uint type);

Shows:

---------------------------
整瑳
---------------------------
整瑳
---------------------------
OK   
---------------------------
jnm2 commented 6 years ago

And as you can verify, if you add either [module: DefaultCharSet(CharSet.Unicode)] or [module: DefaultCharSet(CharSet.Auto)], both examples give:

---------------------------
test
---------------------------
test
---------------------------
OK   
---------------------------
jnm2 commented 6 years ago

Verified that CoreCLR v1 and v2 both behave the same as desktop CLR v4 in this way.

vbfox commented 6 years ago

Didn't know about DefaultCharSet it's a good idea, searching the project history we had quite a few PRs where Unicode was missing or that just fixed it somewhere it was forgotten

AArnott commented 6 years ago

Thanks for that, @jnm2. Sounds good to me too then. I can't think of anywhere that an assumed Ansi semantic is important in the whole repo. @jnm2 would you care to send a PR? I guess the ideal PR would define the assembly-level attribute in its own source file and then use Directory.Build.props to link it into every project.

jnm2 commented 6 years ago

Sure! @vbfox if you want to, I'll let you save me the time. 😄 Otherwise yes.

jnm2 commented 6 years ago

Oh no, I'm sorry! I clean forgot about this.