dotnet / wcf

This repo contains the client-oriented WCF libraries that enable applications built on .NET Core to communicate with WCF services.
MIT License
1.72k stars 558 forks source link

WCF configuration issues: slow connection, netTcpBinding and Transport security give SSPI error #5015

Open martinmueller4voice opened 1 year ago

martinmueller4voice commented 1 year ago

I'm still struggling with the configuration of my WCF services and clients - perhaps someone can shed some light on the topic?

While porting a few services from .NET Remoting to WCF, I noticed several quirks, which I wasn't able to explain:

  1. When using localhost in the service endpoint address, connection attempts work very fast (a few ms). Using the explicit hostname (fully qualified or not) or 127.0.0.1 or 0.0.0.0 slows down each connection attempt dramatically, to about 500ms for every single connection. I couldn't find the reason for this discrepancy.

  2. Using netTcpBinding with Transport security, everything works in every configuration I've tried when server and client are running on the same machine. If the server is running on a different machine (in the same Windows Domain), I get an SSPI error when I use the server machine's name in the client endpoint address, but not if I use the server machine's IP address. If I add <identity><dns value="mydomain.local"/></identity> to the client endpoint config, connection works, but I don't understand why this setting is required if both machines are part of exactly this domain? If I add <identity><servicePrincipalName value="anything"/></identity> instead, it also works, completely independent of which value I give as SPN. Even an empty value works! But if I remove the node altogether, the SSPI error reappears.

I've even asked ChatGPT what the default value for the SPN is and the reply was net.tcp/hostname_or_FQDN:port/service_name, which sounded reasonable.

Using this SPN does work, but every other string I tried worked as well, so I cannot understand the claim that this SPN is somehow validated by AD/Kerberos/whatever.

I've been testing this with a small example, running both the server and the client under my user account. Environment: Windows 10 Client, Windows 11 Server, .NET Framework 4.8.1.

Does anyone of the gurus have an explanation for this and some guidance what the best approach is?

Thanks Martin

martinmueller4voice commented 1 year ago

Addendum: Meanwhile I could confirm that the SSPI error does not occur when the server is being executed under the LocalSystem or NetworkService account, but that doesn't help because often customers want to run the servers under special service users.

Registering a SPN for the user running the server using setspn -S net.tcp/myServerMachine:50042/MyService <my.user.running.the.server> reported success, but I still got the same SSPI error. Only when I added the identity/servicePrincipalName entry to my config with an arbitrary value, the client could connect to the server. But this defeats the whole purpose of registering an SPN in the first place, doesn't it?

mconnew commented 1 year ago

This looks like a .NET Framework issue. While that's generally out of scope in this repo, I'm going to answer the question as I believe it's generally helpful.

Hostname

I believe this is an IPv4/IPv6 issue. When you specify a hostname as the listening address, WCF just uses IPAddress.IPv6Any which causes it to listen on all ip addresses which are IPv4 or IPv6, including the localhost ones. This means that localhost, hostname, hostname.example.com will all listen on IPv4 and IPv6, and 127.0.0.1 and 0.0.0.0 will only listen on IPv4. When you get to the client side, things are slightly different. If you specify an ip address, it will attempt to connect only using that IP address. When using a hostname (including localhost), WCF performs a DNS lookup which can contain multiple addresses (eg IPv4 and IPv6 addresses). It then tries each in turn, with a timeout. I suspect you haven't tested combinations of values on the client and the server. You can get a delay if the server uses an IPv4 hostname (e.g. 127.0.0.1) which will result in only listening on IPv4 addresses, and the client is using a hostname which the OS is resolving to IPv4 and IPv6 addresses and the order being returned to WCF has the IPv6 address first. The WCF client is attempting to connect using the IPv6 address, and timing out before successfully connecting using IPv4. This is the result of doing a DNS lookup on my local machine in a C# Interactive session:

C:\git\wcf>csi
Microsoft (R) Visual C# Interactive Compiler version 4.4.0-6.23101.15 ()
Copyright (C) Microsoft Corporation. All rights reserved.

Type "#help" for more information.
> var hostEntry = System.Net.Dns.GetHostEntry("localhost");
> hostEntry.AddressList.Length
2
> hostEntry.AddressList[0]
[::1]
> hostEntry.AddressList[1]
[127.0.0.1]

As you can see, the first address returned is the IPv6 address for localhost. I suggest doing the same thing for the hostname you specify on the client side to see what addresses it returns. It could also be a firewall issue combined with this, eg the firewall is blocking for port on IPv6 so even though the service is listening on it, it can't connect. Depending on your network topology, you could also potentially be returning multiple ip's where some aren't reachable (multi-homed servers, VPN ip's etc).

ServicePrincipal

When authenticating with Windows credentials, there are 2 ways to do so. You can use Kerberos (which involves the client talking to the domain controller to get a token), or NTLM (where the client effectively sends the credentials to the server). When communicating over localhost, NTLM is always used. When communicating between two machines, it attempts to use Kerberos first, and falls back to NTLM if it can't. With Kerberos, the client needs to provide a service principal name to the domain controller to get an auth token the service can understand. Unless you override the identity on the client side, it will use the hostname you provided. There are many types of identity you can configure WCF to use, but only 2 of them work with Kerberos, they are a UPN identity or an SPN identity. If you don't explicitly configure the client with an identity, it uses an SPN identity for the hostname. By providing a DNS identity, Kerberos can't be used so it's falling back to using NTLM.
The correct identity to use is dependant on the user account the service is running as. If it's running under a system account (which is the default when hosted in IIS), then you use an SPN for the hostname. If the service isn't running under a system account, then you need to use a UPN. A UPN is the email looking style active directory username (e.g. user@example.com). This is the easiest way to host a NetTcp service using Windows auth if you aren't hosting in IIS. Launch the service with a domain account and on the client specify <identity><userPrincipalName value="user@example.com" /></identity>. You can use an SPN too by registering an SPN and granting permissions to use that SPN to the user account the service is running as. I think you might need to configure the service identity too but I believe that's only needed only if you are exposing a metadata endpoint (WSDL/mex) as WCF needs to know which identity to tell the client to use.

I believe you made a mistake in your usage of setspn as you need to specify -U if you are giving permission to a user. Without it, it will try to provide permission to a machine which won't work. So you can do the following:

  1. Execute `setspn -U -S net.tcp/myServerMachine:50042/MyService user@domain.com
  2. Run your service as the user user@domain.com
  3. Configure the client to use <identity><servicePrincipalName value="net.tcp/myServerMachine:50042/MyService"/></identity>

Or you can simply configure the client to use <identity><userPrincipalName value="user@domain.com" /></identity>

martinmueller4voice commented 1 year ago

WOW - thank you so much for explaining the details of authentication so well!

Especially the differentiation between NTLM and Kerberos and which identity could be used under which circumstances was something I couldn't understand from the existing documentation for the life of me.

Meanwhile I got the case of self-hosting with a dedicated user to work by using the UPN for this user in the config file and have verified that the UPN is, in fact, checked as it should be. Haven't tried setspn -U for this, but according to your explanation I'm sure it will work as well.

Regarding the speed issue: Yes, indeed, both of my machines return the IPv6 and IPv4 addresses for "localhost", in that order. When I look at the HostEntriy for the other machine, this contains the IPv4 address only.

Having tested all the different combinations again right now, it seems to behave consistently now. Using "localhost" or "0.0.0.0" or the hostname on the server machine, I can connect using the hostname (and thus receiving just the IPv4 address) of the server on the client side. Using "127.0.0.1" on the server machine allows IPv4 connections from the server machine only and using "[::1]" on the server machine allows IPv6 connections from the server machine only (as is to be expected).

I wasn't able to reproduce any slow connections now, so perhaps it had something to do with our DNS at that time. My colleagues confirmed that they had turned off IPv6 for DNS since "they had problems with it"...

So now it seems as if all my problems have been solved/explained. Thank you very much once again for taking your time!

martinmueller4voice commented 1 year ago

Sorry, one additional question: I've tried registering the SPN for my user using setspn, as you suggested: setspn -U -S net.tcp/<servermachine FQDN>:<port>/MyServer <my.domain>\<my.user> Registration for my user (and not the machine, as you pointed out) worked. Nevertheless, the SSPI error reappeared as soon as I removed the <identity><userPrincipalName value="<my.domain>\<my.user>"/></identity> entry from my config. I also tried netTcp/... instead of net.tcp/... (I found different versions on the web), but that didn't make any difference.

The reason why I'm still trying is that I want to give our customer a version that can be maintained better (only register the SPNs for the service user once and be able to use the standard config files on each client).

mconnew commented 1 year ago

You still need to specify the identity of the servicePrincipalName that you registered. If you just remove the identity config, it will go back to using a service principal name of host/hostname as the SPN which the user account the service is running under doesn't have permissions to use.