Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 560 forks source link

Is Custom Search Domain functionality broken? #3737

Closed grenzr closed 5 years ago

grenzr commented 6 years ago

Is this a request for help?:

YES

Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE

What version of acs-engine?:

v0.20.9

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.11.2

What happened:

We have a custom vnet k8s installation. The vnet uses a non-standard DNS server (corporate Active Directory), so we need to make use of CSD functionality in order to get machines able to resolve each other by DNS.

I have the following configuration block to my apimodel.json

"properties": {
    "linuxProfile": {
      "customSearchDomain": {
        "name": "mydomain.com",
        "realmUser": "SVC_blah",
        "realmPassword": "biglongpassword"
      }
    }
}

On deployment, first thing I noticed was exit code 80.

This is because realm,sssd etc packages failed to install because no prior apt-get update before attempting to install the packages, and thus 404'ing on locating those packages.

After fixing that, then I realised the realmUser, realmPassword, and domain variables were not attempted to be replaced as per https://github.com/Azure/acs-engine/blob/a79c2d7f5464b8a42748de69ca4481ca7dde4a9c/parts/k8s/kubernetesagentcustomdata.yml#L174-L178

So I manually substituted those variables into the script myself just to see if I could get that working, and found that the realm command is missing the --computer-ou switch, which I need to join the machine to a specific OU.

ie.

echo "biglongpassword" | realm join -U SVC_blah@`echo "mydomain.com" | tr /a-z/ /A-Z/` `echo "mydomain.com" | tr /a-z/ /A-Z/` --computer-ou "OU=bleh,OU=blah,DC=MYDOMAIN,DC=COM"

I appreciate maybe the --computer-ou use case hasn't arisen for anyone yet, and I don't mind helping to produce that feature.

After adding the --computer-ou manually I was able to get the machine to join the domain and an /etc/krb5.keytab file generated.

Then I set about trying to use a variance of the register-dns.sh script found at https://github.com/tesharp/acs-engine/blob/register-dns-extension/extensions/register-dns/v1/register-dns.sh

I think I need to add the -g switch to the nsupdate command to make it use established kerberos creds, but it doesn't seem to find them, and also complains it can't find a default realm anywhere (as there is no /etc/krb5.conf file to find it).

I believe you can put such a thing in the /etc/sssd/sssd.conf file, so maybe thats enough, but at this point I was hoping for a bit of advice to see whether I'm going down the right road here before going deeper!

What you expected to happen:

Machine registration, DNS working and general happiness

How to reproduce it (as minimally and precisely as possible):

Try it yourself as per above.

Anything else we need to know:

CecileRobertMichon commented 6 years ago

Hi @grenzr, thanks for reporting. @axier implemented the custom search domains feature in https://github.com/Azure/acs-engine/pull/2590 and may have some input here. Otherwise, feel free to start a PR to fix it and we'll go from there.

grenzr commented 6 years ago

I've started work on the above fixes.. PR will be on the way soon.

CecileRobertMichon commented 6 years ago

/assign @grenzr

grenzr commented 6 years ago

pre-PR work in progress: https://github.com/grenzr/acs-engine/commits/search_domain_improvements

I currently still have an issue where the kubelet service does not appear to be doing the sed lines to substitute the placeholders for all the search domain params. I can see in the journalctl logs its attempting to run realm without the substitutions done:

Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 audit[12231]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/sssd" pid=12231 comm="apparmor_parser"
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 kernel: audit: type=1400 audit(1535995444.001:16): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/sssd" pid=12231 comm="apparmor_parser"
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: Reloading.
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: Started ACPI event daemon.
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: Starting System Security Services Daemon...
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 audit[12287]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/etc/nscd.conf" pid=12287 comm="sssd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 kernel: audit: type=1400 audit(1535995444.217:17): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/etc/nscd.conf" pid=12287 comm="sssd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 sssd[12287]: NSCD socket was detected and seems to be configured to cache some of the databases controlled by SSSD [passwd,group,netgroup,services]. It is recommended not to run NSCD in parallel with SSSD,
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 sssd[12287]: Configuration file: /etc/sssd/sssd.conf does not exist.
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: sssd.service: Main process exited, code=exited, status=4/NOPERMISSION
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: Failed to start System Security Services Daemon.
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: sssd.service: Unit entered failed state.
Sep 03 17:24:04 k8s-agentpri-35085497-vmss000000 systemd[1]: sssd.service: Failed with result 'exit-code'.
Sep 03 17:24:05 k8s-agentpri-35085497-vmss000000 systemd[1]: Reloading.
Sep 03 17:24:05 k8s-agentpri-35085497-vmss000000 systemd[1]: Started ACPI event daemon.
Sep 03 17:24:05 k8s-agentpri-35085497-vmss000000 dbus[1524]: [system] Reloaded configuration
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 dbus[1524]: [system] Activating service name='org.freedesktop.realmd' (using servicehelper)
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: Loaded settings from: /usr/lib/realmd/realmd-defaults.conf /usr/lib/realmd/realmd-distro.conf
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: holding daemon: startup
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: starting service
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: connected to bus
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: released daemon: startup
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 dbus[1524]: [system] Successfully activated service 'org.freedesktop.realmd'
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: claimed name on bus: org.freedesktop.realmd
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: client using service: :1.11
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: holding daemon: :1.11
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: Using 'r174.12388' operation for method 'Discover' invocation on 'org.freedesktop.realmd.Provider' interface
Sep 03 17:24:06 k8s-agentpri-35085497-vmss000000 realmd[12394]: Registered cancellable for operation 'r174.12388'
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * Resolving: _ldap._tcp.<searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * Resolving: _ldap._tcp.<searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]: No DNS record of the requested type for '_ldap._tcp.<searchdomainname>'
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * Resolving: <searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * Resolving: <searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]: Resolving <searchdomainname> failed: No DNS record of the requested type for '_kerberos._udp.<searchdomainname>'
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]: Error resolving '<searchdomainname>': Name or service not known
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * No results: <searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]:  * No results: <searchdomainname>
Sep 03 17:24:07 k8s-agentpri-35085497-vmss000000 realmd[12394]: client gone away: :1.11

Any help to progress this would be appreciated :)

CecileRobertMichon commented 6 years ago

@grenzr could you please open a WIP (work in progress) PR and describe the problem you're having and I or someone in the community can try to help

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.