Open Cronan opened 4 years ago
Moved to corefx repo as this isn't a bug/change introduced in WCF code. @davidsh, are you aware of any changes in NegotiateStream between 2.1.3 and 2.1.13 which would have caused this? @Cronan, if this works with 2.1.3 on the same host, then that should rule out any problems with kinit. I also wouldn't expect an error saying it couldn't find the server in the database if there was a problem with kinit as that implies it's communicating with the Kerberos server successfully.
Thanks for the reply, and thanks for moving it to the correct place. Yes, I’ve tested this on several hosts, both docker containers and Linux headnodes, and the behaviour is consistent. Let me know right away if there’s anything I can provide to help diagnose the issue.
On 8 Oct 2019, at 21:52, Matt Connew notifications@github.com wrote:
Moved to corefx repo as this isn't a bug/change introduced in WCF code. @davidsh, are you aware of any changes in NegotiateStream between 2.1.3 and 2.1.13 which would have caused this? @Cronan, if this works with 2.1.3 on the same host, then that should rule out any problems with kinit. I also wouldn't expect an error saying it couldn't find the server in the database if there was a problem with kinit as that implies it's communicating with the Kerberos server successfully.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@Cronan, it might be useful to find the exact version which caused the break. I wasn't going to suggest that as it can be a lot of work and it's possible that the appropriate devs might know which changes may have broken this, but if you have it working in docker, it might not be a lot of work. Especially if you do a binary search through the versions. Something quicker that would likely be useful is to capture a Kerberos trace for when it's working and when it's not working. You can do this by setting the environment variable KRB5_TRACE before you run your code to capture Kerberos tracing. You can do one of the following:
Output log to stdout
export KRB5_TRACE = /dev/stdout
Output log to file
export KRB5_TRACE = ~/krb5_trace.log
@davidsh, are you aware of any changes in NegotiateStream between 2.1.3 and 2.1.13 which would have caused this?
Thanks for bringing this issue to my attention. Yes, there were changes between 2.1.3 and 2.1.13. We ported some fixes from .NET Core 3.0 into the latest servicing releases for 2.1 and 2.2. These were high priority Linux Kerberos related fixes.
See: dotnet/corefx#40109
I'll need to dig into your particular scenario to see why it appears to have been regressed by these changes.
Can you please provide a repro for this? Or describe in great detail the scenario, networking environment and especially how Kerberos is being used. What OS is the server? What OS is the client?
For the Linux client, please include the krb5.conf file on the machine and a KRB5_TRACE.LOG as described above. But, please don't post any confidential information to this GitHub issue. If necessary, you can mask off any information in the traces that is confidential.
Thank you, both great suggestions, I’ll try first thing tomorrow.
I’m finding it hard to find up to date Kerberos rpms, and the details of working with the tarballs are not very complete, do you have some tips for trying other versions that might be quicker?
On 8 Oct 2019, at 22:53, Matt Connew notifications@github.com wrote:
@Cronan, it might be useful to find the exact version which caused the break. I wasn't going to suggest that as it can be a lot of work and it's possible that the appropriate devs might know which changes may have broken this, but if you have it working in docker, it might not be a lot of work. Especially if you do a binary search through the versions. Something quicker that would likely be useful is to capture a Kerberos trace for when it's working and when it's not working. You can do this by setting the environment variable KRB5_TRACE before you run your code to capture Kerberos tracing. You can do one of the following:
Output log to stdout
export KRB5_TRACE = /dev/stdout Output log to file
export KRB5_TRACE = ~/krb5_trace.log — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@Cronan, I wouldn't try changing your Kerberos stack at this point, it will just add noise. Things were working with .NET Core 2.1.3 so unless we find a reason to believe there's an issue with the version of Kerberos you are running with, just leave everything which isn't .NET Core in the same state as when it's working with .NET Core 2.1.3.
@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?
@Cronan, for your WCF client, are you explicitly specifying a service identity? You would have a reference to the one of the classes SpnEndpointIdentity, UpnEndpointIdentity or DnsEndpointIdentity. If you don't have a reference to any of these in your code, we default to using a targetName of host/hostname where hostname is extracted from the uri of the service you are connecting to. So depending on your the address you provided to WCF it might not be a FQDN.
@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?
No. But we need a repro. At the very least, snippets of WCF code for the server and client and how the service is configured.
Also, can you please get KRB5_TRACES for both the working scenario (2.1.3) and broken scenario (2.1.13)?
As a workaround can you try setting this environment variable on the Linux client? It will enable the 'legacy' HTTP stack (uses libcurl on Linux). I'm curious if it will help with this problem at least as a workaround for now:
export DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0
@davidsh, this is not using HTTP, this is NegotiateStream.
@davidsh, this is not using HTTP, this is NegotiateStream.
Oh. Good point. Yes, setting that environment variable won't change anything.
@Cronan, I wouldn't try changing your Kerberos stack at this point, it will just add noise. Things were working with .NET Core 2.1.3 so unless we find a reason to believe there's an issue with the version of Kerberos you are running with, just leave everything which isn't .NET Core in the same state as when it's working with .NET Core 2.1.3.
Sorry, I actually meant Centos rpms, I was typing that on a phone ...
cat /etc/krb5.conf
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = MY.DOMAIN.COM
dns_lookup_realm = false
dns_lookup_kdc = true
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_tkt_enctypes = rc4-hmac
default_tgs_enctypes = rc4-hmac
[realms]
MY.DOMAIN.COM = {
kdc = uk-ldap.MY.DOMAIN.COM:88
admin_server = MY.DOMAIN.COM:749
default_domain = MY.DOMAIN.COM
}
[domain_realm]
.myotherdomain.com = MY.DOMAIN.COM
MY.DOMAIN.COM = MY.DOMAIN.COM
.MY.DOMAIN.COM = MY.DOMAIN.COM
myotherdomain.com = MY.DOMAIN.COM
KRB5_TRACE.LOG for 2.1.3 (working scenario):
cat krb5_trace.log
[9407] 1570605553.815700: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[9407] 1570605553.815701: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[9407] 1570605553.815702: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[9407] 1570605553.815704: Creating authenticator for icronyn@MY.DOMAIN.COM -> tcp/mywindowsservice-prod.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, seqnum 823849255, subkey rc4-hmac/23EC, session key rc4-hmac/B264
[9407] 1570605553.815709: Read AP-REP, time 1570605553.815705, subkey rc4-hmac/23EC, seqnum 1116169193
KRB5_TRACE.LOG for 2.1.13 (failing scenario):
cat krb5_trace.log
[16826] 1570607597.439200: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[16826] 1570607597.439201: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439202: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: -1765328243/Matching credential not found (filename: /var/opt/thinlinc/sessions/icronyn/10/krb5cc)
[16826] 1570607597.439203: Retrieving icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[16826] 1570607597.439204: Starting with TGT for client realm: icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM
[16826] 1570607597.439205: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals on
[16826] 1570607597.439206: Generated subkey for TGS request: rc4-hmac/1EBF
[16826] 1570607597.439207: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439209: Encoding request body and padata into FAST request
[16826] 1570607597.439210: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439211: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439212: Initiating TCP connection to stream 10.194.10.221:88
[16826] 1570607597.439213: Sending TCP request to stream 10.194.10.221:88
[16826] 1570607597.439214: Received answer (406 bytes) from stream 10.194.10.221:88
[16826] 1570607597.439215: Terminating TCP connection to stream 10.194.10.221:88
[16826] 1570607597.439216: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439217: No URI records found
[16826] 1570607597.439218: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439219: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439220: No SRV records found
[16826] 1570607597.439221: Response was not from master KDC
[16826] 1570607597.439222: Decoding FAST response
[16826] 1570607597.439223: TGS request result: -1765328377/Server not found in Kerberos database
[16826] 1570607597.439224: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals off
[16826] 1570607597.439225: Generated subkey for TGS request: rc4-hmac/D1CF
[16826] 1570607597.439226: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439228: Encoding request body and padata into FAST request
[16826] 1570607597.439229: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439230: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439231: Initiating TCP connection to stream 10.192.10.221:88
[16826] 1570607597.439232: Sending TCP request to stream 10.192.10.221:88
[16826] 1570607597.439233: Received answer (406 bytes) from stream 10.192.10.221:88
[16826] 1570607597.439234: Terminating TCP connection to stream 10.192.10.221:88
[16826] 1570607597.439235: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439236: No URI records found
[16826] 1570607597.439237: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439238: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439239: No SRV records found
[16826] 1570607597.439240: Response was not from master KDC
[16826] 1570607597.439241: Decoding FAST response
[16826] 1570607597.439242: TGS request result: -1765328377/Server not found in Kerberos database
[17052] 1570607597.686051: Getting initial credentials for icronyn@MY.DOMAIN.COM
[17052] 1570607597.686053: Sending unauthenticated request
[17052] 1570607597.686054: Sending request (199 bytes) to MY.DOMAIN.COM
[17052] 1570607597.686055: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686056: Sending initial UDP request to dgram 10.192.10.222:88
[17052] 1570607597.686057: Received answer (207 bytes) from dgram 10.192.10.222:88
[17052] 1570607597.686058: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686059: No URI records found
[17052] 1570607597.686060: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[17052] 1570607597.686061: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686062: No SRV records found
[17052] 1570607597.686063: Response was not from master KDC
[17052] 1570607597.686064: Received error from KDC: -1765328359/Additional pre-authentication required
[17052] 1570607597.686067: Preauthenticating using KDC method data
[17052] 1570607597.686068: Processing preauth types: 16, 15, 11, 19, 2
[17052] 1570607597.686069: Selected etype info: etype rc4-hmac, salt "", params ""
[17052] 1570607597.686070: PKINIT client has no configured identity; giving up
[17052] 1570607597.686071: PKINIT client has no configured identity; giving up
[17052] 1570607597.686072: Preauth module pkinit (16) (real) returned: 22/Invalid argument
[17052] 1570607597.686073: PKINIT client ignoring draft 9 offer from RFC 4556 KDC
[17052] 1570607597.686074: Preauth module pkinit (15) (real) returned: -1765328360/Preauthentication failed
[17052] 1570607597.686075: AS key obtained for encrypted timestamp: rc4-hmac/E3F4
[17052] 1570607597.686077: Encrypted timestamp (for 1570607597.820889): plain 301AA011180F32303139313030393037353331375AA10502030C8699, encrypted 92D1E0DCC1611B46145C9DA2A5AB09A6D2BBF2F7F7195A523D12DF00A7F030DFA688A310727421687A2D3DADC17969D0D7A665BF
[17052] 1570607597.686078: Preauth module encrypted_timestamp (2) (real) returned: 0/Success
[17052] 1570607597.686079: Produced preauth for next request: 2
[17052] 1570607597.686080: Sending request (275 bytes) to MY.DOMAIN.COM
[17052] 1570607597.686081: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686082: Sending initial UDP request to dgram 10.194.0.87:88
[17052] 1570607597.686083: Received answer (122 bytes) from dgram 10.194.0.87:88
[17052] 1570607597.686084: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686085: No URI records found
[17052] 1570607597.686086: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[17052] 1570607597.686087: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686088: No SRV records found
[17052] 1570607597.686089: Response was not from master KDC
[17052] 1570607597.686090: Received error from KDC: -1765328332/Response too big for UDP, retry with TCP
[17052] 1570607597.686091: Request or response is too big for UDP; retrying with TCP
[17052] 1570607597.686092: Sending request (275 bytes) to MY.DOMAIN.COM (tcp only)
[17052] 1570607597.686093: Resolving hostname uk-ldap.MY.DOMAIN.COM
[17052] 1570607597.686094: Initiating TCP connection to stream 10.192.10.222:88
[17052] 1570607597.686095: Sending TCP request to stream 10.192.10.222:88
[17052] 1570607597.686096: Received answer (5100 bytes) from stream 10.192.10.222:88
[17052] 1570607597.686097: Terminating TCP connection to stream 10.192.10.222:88
[17052] 1570607597.686098: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[17052] 1570607597.686099: No URI records found
[17052] 1570607597.686100: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[17052] 1570607597.686101: No SRV records found
[17052] 1570607597.686102: Response was not from master KDC
[17052] 1570607597.686103: Salt derived from principal: MY.DOMAIN.COMicronyn
[17052] 1570607597.686104: AS key determined by preauth: rc4-hmac/E3F4
[17052] 1570607597.686105: Decrypted AS reply; session key is: rc4-hmac/5E7D
[17052] 1570607597.686106: FAST negotiation: unavailable
[17052] 1570607597.686107: Initializing FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with default princ icronyn@MY.DOMAIN.COM
[17052] 1570607597.686108: Storing icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[17052] 1570607597.686109: Storing config in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc for krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM: pa_type: 2
[17052] 1570607597.686110: Storing icronyn@MY.DOMAIN.COM -> krb5_ccache_conf_data/pa_type/krbtgt\/MY.DOMAIN.COM\@MY.DOMAIN.COM@X-CACHECONF: in FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439250: ccselect module realm chose cache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with client principal icronyn@MY.DOMAIN.COM for server principal tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM
[16826] 1570607597.439251: Getting credentials icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM using ccache FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc
[16826] 1570607597.439252: Retrieving icronyn@MY.DOMAIN.COM -> tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: -1765328243/Matching credential not found (filename: /var/opt/thinlinc/sessions/icronyn/10/krb5cc)
[16826] 1570607597.439253: Retrieving icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM from FILE:/var/opt/thinlinc/sessions/icronyn/10/krb5cc with result: 0/Success
[16826] 1570607597.439254: Starting with TGT for client realm: icronyn@MY.DOMAIN.COM -> krbtgt/MY.DOMAIN.COM@MY.DOMAIN.COM
[16826] 1570607597.439255: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals on
[16826] 1570607597.439256: Generated subkey for TGS request: rc4-hmac/3343
[16826] 1570607597.439257: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439259: Encoding request body and padata into FAST request
[16826] 1570607597.439260: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439261: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439262: Initiating TCP connection to stream 10.192.10.222:88
[16826] 1570607597.439263: Sending TCP request to stream 10.192.10.222:88
[16826] 1570607597.439264: Received answer (406 bytes) from stream 10.192.10.222:88
[16826] 1570607597.439265: Terminating TCP connection to stream 10.192.10.222:88
[16826] 1570607597.439266: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439267: No URI records found
[16826] 1570607597.439268: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439269: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439270: No SRV records found
[16826] 1570607597.439271: Response was not from master KDC
[16826] 1570607597.439272: Decoding FAST response
[16826] 1570607597.439273: TGS request result: -1765328377/Server not found in Kerberos database
[16826] 1570607597.439274: Requesting tickets for tcp/mywindowsserviceothername.MY.DOMAIN.COM:8184@MY.DOMAIN.COM, referrals off
[16826] 1570607597.439275: Generated subkey for TGS request: rc4-hmac/FCED
[16826] 1570607597.439276: etypes requested in TGS request: rc4-hmac
[16826] 1570607597.439278: Encoding request body and padata into FAST request
[16826] 1570607597.439279: Sending request (5328 bytes) to MY.DOMAIN.COM
[16826] 1570607597.439280: Resolving hostname uk-ldap.MY.DOMAIN.COM
[16826] 1570607597.439281: Initiating TCP connection to stream 10.194.0.87:88
[16826] 1570607597.439282: Sending TCP request to stream 10.194.0.87:88
[16826] 1570607597.439283: Received answer (406 bytes) from stream 10.194.0.87:88
[16826] 1570607597.439284: Terminating TCP connection to stream 10.194.0.87:88
[16826] 1570607597.439285: Sending DNS URI query for _kerberos.MY.DOMAIN.COM.
[16826] 1570607597.439286: No URI records found
[16826] 1570607597.439287: Sending DNS SRV query for _kerberos-master._udp.MY.DOMAIN.COM.
[16826] 1570607597.439288: Sending DNS SRV query for _kerberos-master._tcp.MY.DOMAIN.COM.
[16826] 1570607597.439289: No SRV records found
[16826] 1570607597.439290: Response was not from master KDC
[16826] 1570607597.439291: Decoding FAST response
[16826] 1570607597.439292: TGS request result: -1765328377/Server not found in Kerberos database
So, the first thing that jumps out to me is that in the working version the server principal used is mywindowsservice-prod, which is the alias used in the SPN and URI by the client to connect.
In the failing version, the server principal used is the server name, not the alias used as above.
SPN = "tcp/mywindowsservice-prod.my.domain.com:8184"
URI = "net.tcp://mywindowsservice-prod.my.domain.com:8184/MyDataService"
nslookup mywindowsservice-prod
Server: machine123.myotherdomain.com
Address: 10.192.207.245
Non-authoritative answer:
Name: mywindowsserviceothername.my.domain.com
Address: 10.192.226.115
Aliases: mywindowsservice-prod.my.domain.com
@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?
@Cronan, for your WCF client, are you explicitly specifying a service identity? You would have a reference to the one of the classes SpnEndpointIdentity, UpnEndpointIdentity or DnsEndpointIdentity. If you don't have a reference to any of these in your code, we default to using a targetName of host/hostname where hostname is extracted from the uri of the service you are connecting to. So depending on your the address you provided to WCF it might not be a FQDN.
Sample code below:
var identity = new SpnEndpointIdentity("tcp/mywindowsservice-prod.my.domain.com:8184");
TokenImpersonationLevel tokenImpersonationLevel = TokenImpersonationLevel.Identification;
var endpointAddress = new EndpointAddress(new Uri(net.tcp://mywindowsservice-prod.my.domain.com:8184/MyDataService/SomeData), identity);
@davidsh, He's provided details of the client in the initial bug description. He also explained the server is a Windows WCF service. Does the specific Windows version matter?
No. But we need a repro. At the very least, snippets of WCF code for the server and client and how the service is configured.
Also, can you please get KRB5_TRACES for both the working scenario (2.1.3) and broken scenario (2.1.13)?
As a workaround can you try setting this environment variable on the Linux client? It will enable the 'legacy' HTTP stack (uses libcurl on Linux). I'm curious if it will help with this problem at least as a workaround for now:
export DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0
Service config:
<system.serviceModel>
<services>
<service behaviorConfiguration="MyBehaviour" name="MyDataServiceCode.Facades.SomeData">
<endpoint address="net.tcp://localhost:8184/MyDataService/SomeData"
binding="netTcpBinding" bindingConfiguration="UnreliableTCP"
name="MyDataServiceSomeData" contract="MyDataServiceCode.Interfaces.ISomeData" />
</service>
</services>
<behaviours>
<serviceBehaviors>
<behavior name="MyBehaviour">
<serviceDebug includeExceptionDetailInFaults="true" />
<serviceThrottling maxConcurrentCalls="500" maxConcurrentInstances="500" maxConcurrentSessions="500" />
<dataContractSerializer maxItemsInObjectGraph="2147483647" />
<serviceAuthorization impersonateCallerForAllOperations="true"></serviceAuthorization>
</behavior>
</serviceBehaviors>
</behaviours>
<bindings>
<netTcpBinding>
<binding name="UnreliableTCP" receiveTimeout="23:59:59" transferMode="Buffered" maxBufferPoolSize="0" maxBufferSize="2147483647"
maxConnections="100" maxReceivedMessageSize="2147483647">
<reliableSession inactivityTimeout="23:59:59" enabled="false" />
</binding>
</netTcpBinding>
</bindings>
</system.serviceModel>
@Cronan, it might be useful to find the exact version which caused the break. I wasn't going to suggest that as it can be a lot of work and it's possible that the appropriate devs might know which changes may have broken this, but if you have it working in docker, it might not be a lot of work. Especially if you do a binary search through the versions. Something quicker that would likely be useful is to capture a Kerberos trace for when it's working and when it's not working. You can do this by setting the environment variable KRB5_TRACE before you run your code to capture Kerberos tracing. You can do one of the following:
Output log to stdout
export KRB5_TRACE = /dev/stdout
Output log to file
export KRB5_TRACE = ~/krb5_trace.log
I did the binary search and 2.1.12 is the most recent version that works, and 2.1.13 is the version that starts failing.
So, the first thing that jumps out to me is that in the working version the server principal used is mywindowsservice-prod, which is the alias used in the SPN and URI by the client to connect. In the failing version, the server principal used is the server name, not the alias used as above.
As an experiment, please try adding this to your krb5.conf file in the [libdefaults] section
dns_canonicalize_hostname = false
Normally, the best-practice is to define SPNs that use the FQDN of the server. And that means using the A record DNS name. But it seems that your environment has defined the SPN against the CNAME (alias) DNS record. Setting the above in your krb5.conf file should prevent forward lookups of the hostname.
I understand that this worked before upgrading to 2.1.13. But I suspect other changes we made in that servicing release has now impacted using CNAMEs as SPNs. Please try this out and also send us updated KRB5_TRACE logs.
@Cronan, you are including the TCP port number your service is running on as part of the SPN name. Is that intentional? Is the port number part of the SPN as set up in your DC?
@davidsh, only a process running as the SYSTEM user or one of it's equivalents such as Network Service are able to use the hostname or FQDN SPN. If you are running with any other user, you normally need to create a different SPN to use.
@davidsh, I could understand the CNAME/FQDN thing when using HttpClient, but this is NegotiateStream. WCF is explicitly proving the targetName from the provided SpnEndpointIdentity. Are you saying that NegotitateStream will use something different than what was provided to the targetName argument?
Are you saying that NegotitateStream will use something different than what was provided to the targetName argument?
NegotiateStream, HttpClient, and SqlClient all use a common native library, System.Net.Security.Native on Linux. That library does the various GSS-API calls to support Negotiate/Kerberos/NTLM protocol. GSS-API uses the Linux krb5.conf file for various settings.
The transformation of the passed in 'targetName' from NegotiateStream is still present at the API call to the GSS-API. But in the implementation of the GSS-API call, SPNEGO protocol uses the settings in the krb5.conf file and apparently is transforming the 'targetName' and normalizing it (i.e CNAME -> A record) before doing a Kerberos principal lookup. That transformation is not something controllable at the .NET layer. That is why I suggested to try the dns_canonicalize_hostname = false
in the krb5.conf file as an experiment to see it is helps. It will help narrow down what is happening in this network environment.
I found the documentation for this behavior. This seems an odd design decision as the app can do that dance if needed but as you said, nothing that can be done about it. Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.
Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.
We made the change in .NET Core 3.0 and servicing for .NET Core 2.1/2.2 because a lot of scenarios were broken until we start using GSS_C_NT_HOSTBASED_SERVICE, which appears to be the preferred format for the SPNEGO protocol plugin for GSS-API.
But this issue here raises some interesting test areas that we probably want to address in a better way long-term in .NET Core. However, short-term, adjusting krb5.conf might be the best way to unblock things for now.
@Cronan, you are including the TCP port number your service is running on as part of the SPN name. Is that intentional? Is the port number part of the SPN as set up in your DC?
Yes, the port number is is part of the SPN. We tend to use the same name but different ports for TCP vs REST, for example.
Based on those docs, it looks like this was caused by a change to use GSS_C_NT_HOSTBASED_SERVICE.
We made the change in .NET Core 3.0 and servicing for .NET Core 2.1/2.2 because a lot of scenarios were broken until we start using GSS_C_NT_HOSTBASED_SERVICE, which appears to be the preferred format for the SPNEGO protocol plugin for GSS-API.
But this issue here raises some interesting test areas that we probably want to address in a better way long-term in .NET Core. However, short-term, adjusting krb5.conf might be the best way to unblock things for now.
I'm definitely going to try this today, but it's obviously not a fix for an entire environment - for example, it would prevent short names (like myservice-prod
) from being canonicalised to myservice-prod.mydomain.com
, if I understand the option correctly:
dns_canonicalize_hostname Indicate whether name lookups will be used to canonicalize hostnames for use in service principal names. Setting this flag to false can improve security by reducing reliance on DNS, but means that short hostnames will not be canonicalized to fully-qualified hostnames. The default value is true.
It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl <hostname> --anyauth -u :
hitting a WCF REST API, also using kerberos) seem totally happy with cnames
or a records
, but .NET Core is super-strict on requiring a records
across the board.
Or am I misunderstanding the situation?
It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl
--anyauth -u : hitting a WCF REST API, also using kerberos) seem totally happy with cnames or a records, but .NET Core is super-strict on requiring a records across the board. Or am I misunderstanding the situation?
Are all your SPNs defined only against FQDN CNAMES? Or do you have a mixture of services defined with SPNs using either A or CNAME?
Interesting article talking about changes to MIT Kerberos libraries that will allow for trying different behavior (CNAME first, then A if CNAME fails) when trying to find proper SPN.
This new version of MIT Kerberos (Release 1.18) will have a new option for dns_canonicalize_hostname
option in the krb5.conf file:
dns_canonicalize_hostname = fallback
See: http://k5wiki.kerberos.org/wiki/Projects/Server_Hostname_Canonicalization
It also leaves my systems in a weird position, where most Linux calls that need authentication (e.g. curl --anyauth -u : hitting a WCF REST API, also using kerberos) seem totally happy with cnames or a records, but .NET Core is super-strict on requiring a records across the board. Or am I misunderstanding the situation?
Are all your SPNs defined only against FQDN CNAMES? Or do you have a mixture of services defined with SPNs using either A or CNAME?
There are a mixture of services defined with SPNs using either A or CNAME. Either seem to work OK in Windows, some are tricky from Linux.
Interesting article talking about changes to MIT Kerberos libraries that will allow for trying different behavior (CNAME first, then A if CNAME fails) when trying to find proper SPN.
This new version of MIT Kerberos (Release 1.18) will have a new option for
dns_canonicalize_hostname
option in the krb5.conf file:dns_canonicalize_hostname = fallback
See: http://k5wiki.kerberos.org/wiki/Projects/Server_Hostname_Canonicalization
It looks to me like Centos 7 is currently on 1.15
sudo yum list installed | grep kr
krb5-devel.x86_64 1.15.1-37.el7_6 @lgb-7.4-centos-base-updates-2019-07
krb5-libs.x86_64 1.15.1-37.el7_6 @lgb-7.4-centos-base-updates-2019-07
krb5-workstation.x86_64 1.15.1-37.el7_6 @lgb-7.4-centos-base-updates-2019-07
Yes, definitely 1.15.1:
klist -V
Kerberos 5 version 1.15.1
But this looks interesting - do you think this will work with GSS_C_NT_HOSTBASED_SERVICE
?
do you think this will work with GSS_C_NT_HOSTBASED_SERVICE?
I haven't tried it out yet. But those krb5.conf options do affect GSS_C_NT_HOSTBASED_SERVICE name types. So, I do expect it to work with the new dns_canonicalize_hostname = fallback
setting.
@davidsh, it would seem that the switch to using GSS_C_NT_HOSTBASED_SERVICE is a breaking change. Would it be possible to retroactively add an AppSetting to control this on a per-app basis? Requiring a system wide krb5.conf configuration which affects all Kerberos usage (and could break a different scenario when setting things so .NET Core works) seems a little too coarse to control behavior.
Requiring a system wide krb5.conf configuration which affects all Kerberos usage (and could break a different scenario when setting things so .NET Core works) seems a little too coarse to control behavior.
Apps can use a custom krb5.conf file by setting an environment variable:
http://web.mit.edu/kerberos/krb5-1.13/doc/admin/conf_files/krb5_conf.html
krb5.conf
The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and
admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos
applications, and mappings of hostnames onto Kerberos realms. Normally, you should install your
krb5.conf file in the directory /etc. You can override the default location by setting the environment
variable KRB5_CONFIG. Multiple colon-separated filenames may be specified in KRB5_CONFIG; all
files which are present will be read.
@Cronan
There are a mixture of services defined with SPNs using either A or CNAME. Either seem to work OK in Windows, some are tricky from Linux.
Are all your channels/services using the same type of WCF bindings? Are any using HTTP related WCF bindings? I Would like to understand if any of them use .NET Core classes besides System.Net.Security.NegotiateStream.
All the services I’m currently using are WCF TCP - I’ll try the custom Conf file and report back.
I’ll also investigate and see whether there are HTTP services in use that I can test.
On 19 Oct 2019, at 18:59, David Shulman notifications@github.com wrote:
Requiring a system wide krb5.conf configuration which affects all Kerberos usage (and could break a different scenario when setting things so .NET Core works) seems a little too coarse to control behavior.
Apps can use a custom krb5.conf file by setting an environment variable:
http://web.mit.edu/kerberos/krb5-1.13/doc/admin/conf_files/krb5_conf.html
krb5.conf The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. Normally, you should install your krb5.conf file in the directory /etc. You can override the default location by setting the environment variable KRB5_CONFIG. Multiple colon-separated filenames may be specified in KRB5_CONFIG; all files which are present will be read. @Cronan
There are a mixture of services defined with SPNs using either A or CNAME. Either seem to work OK in Windows, some are tricky from Linux.
Are all your channels/services using the same type of WCF bindings? Are any using HTTP related WCF bindings? I Would like to understand if any of them use .NET Core class besides System.Net.Security.NegotiateStream.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
All the services I’m currently using are WCF TCP - I’ll try the custom Conf file and report back.
Have you been able to test using the custom .conf file?
Also, do you have NTLM installed on the Linux machines (gss-ntlmssp)? And what credential are you using in the .NET layer? Is it CredentialCache.DefaultCredentials
?
The reason for the questions is that when Negotiate (SPNEGO) protocol is used but has errors with Kerberos, NTLM fallback would be possible assuming you are using explicit credentials and have installed the NTLM package on the Linux machine.
All the services I’m currently using are WCF TCP - I’ll try the custom Conf file and report back.
Have you been able to test using the custom .conf file?
Yes, this worked! 🎉 Thank you @davidsh and @mconnew
Also, do you have NTLM installed on the Linux machines (gss-ntlmssp)? Yes, I believe so ...
sudo yum list installed | grep ntlm libntlm.x86_64 1.3-6.el7 @lgb-7.4-centos-base
And what credential are you using in the .NET layer? Is it
CredentialCache.DefaultCredentials
?
I don't see anything like this call in my WCF client code, where would it be?
The reason for the questions is that when Negotiate (SPNEGO) protocol is used but has errors with Kerberos, NTLM fallback would be possible assuming you are using explicit credentials and have installed the NTLM package on the Linux machine.
@karelz What more info do you need from me?
@Cronan it was part of our triage, so I assume @davidsh didn't feel like he has all the info to make it actionable yet. There is lots of back and forth on this issue. What is IMO needed is clear repro environment setup that will allow us to reproduce it locally and dig deeper. Do you think you can help us identify the key environment setup knobs to reproduce it?
@Cronan it was part of our triage, so I assume @davidsh didn't feel like he has all the info to make it actionable yet.
There is lots of back and forth on this issue. What is IMO needed is clear repro environment setup that will allow us to reproduce it locally and dig deeper. Do you think you can help us identify the key environment setup knobs to reproduce it?
Actually, I do understand the root cause here. This issue is caused by a regression in behavior due to a PR fixing related Linux Kerberos issues as described above. See: https://github.com/dotnet/runtime/issues/31110#issuecomment-540230816
I know how to reproduce this issue.
The fix for this issue is complex because it will require some changes to the PAL layer to fix both this issue and preserve the other fixes previously made. I have been thinking about the proper fix. We can also test the fix in the 'runtime-libraries enterprise-linux' DevOps pipline that I previously created.
@davidsh do you think we can address this realistically in 5.0?
It's a non-trivial fix. The fix has to be done in the Linux PAL layer. It has to be able to try two different forms of Kerberos principal names. I.e. try one form and if it doesn't match to an SPN in Kerberos, then try the alternate format.
Based on current priorities and staffing I don't see this as achievable for 5.0.
That’s disappointing but understandable. If someone can point me to the relevant place in the code to start looking I’m prepared to try to take this on.
Describe the bug I use the WCF client from .NET Core to access Windows WCF services from Linux. Everything works correctly using .NET Core 2.1.3
Upgrading to 2.1.13 results in the exception below when calling the WCF client. I also see the same problem with .NET Core 2.2 latest or 3.0 latest.
Expected behavior I use the WCF client from .NET Core to access Windows WCF services from Linux. Everything works correctly using .NET Core 2.1.3
I expected this to continue working in newer versions of .NET Core.
Additional context I use
kinit
before making this call to ensure that I'm authenticated correctly. Am I missing some dependencies, or could this be something else?Linux Version
.NET Core Version
Full Stack trace