389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 92 forks source link

Segfault during replication startup on Arm device #2805

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49746


Issue Description

On Arm platform (specifically Pi 3 running fc27.armv7hl) during ipa-replica-install, 389-ds crashes when trying to start up the replication to the master with a segfault.

Package Version and Platform

Steps to reproduce

  1. Run ipa-replica-install
  2. Wait for step [33/41]: enabling S4U2Proxy delegation, 389-ds crashes (or has crashed before?)

Actual results

  [28/41]: setting up initial replication
Starting replication, please wait until this has completed.
Update in progress, 11 seconds elapsed
Update succeeded

  [29/41]: prevent time skew after initial replication
  [30/41]: adding sasl mappings to the directory
  [31/41]: updating schema
  [32/41]: setting Auto Member configuration
  [33/41]: enabling S4U2Proxy delegation
  [error] NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

ipapython.admintool: ERROR    cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
ipapython.admintool: ERROR    The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

Expected results

ipa-replica-install completes successfully.

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-01 23:23:44

While diagnosing this issue on freeipa-users mailing list, Mark Reynolds supplied a proposed patch which appears to solve the problem (worked both on 1.3.7 and 1.3.8):

diff --git a/ldap/servers/plugins/replication/repl5_agmt.c b/ldap/servers/plugins/replication/repl5_agmt.c
index d71d3f38b..e0f1f41bd 100644
--- a/ldap/servers/plugins/replication/repl5_agmt.c
+++ b/ldap/servers/plugins/replication/repl5_agmt.c
@@ -3035,7 +3035,7 @@ agmt_update_maxcsn(Replica *r, Slapi_DN *sdn, int op, LDAPMod **mods, CSN *csn)
                 slapi_ch_free_string(&agmt->maxcsn);
                 agmt->maxcsn = slapi_ch_smprintf("%s;%s;%s;%ld;%d;%s", slapi_sdn_get_dn(agmt->replarea),
                                                  slapi_rdn_get_value_by_ref(slapi_rdn_get_rdn(agmt->rdn)), agmt->hostname,
-                                                 agmt->port, agmt->consumerRID, maxcsn);
+                                                 (long)agmt->port, (int)agmt->consumerRID, maxcsn);
             }
             PR_Unlock(agmt->lock);
         }

I'm not sure if this requires further work before being applied widely as this might be Arm specific. Mark asked me to open this issue and also supply him with the build output with and without the above patch, which I am doing (once the build completes).

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-02 15:08:21

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-02 15:08:49

The ARM issues were fixed in 1.4.0 via #2677

This ticket is for backing porting those changes...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-02 15:08:50

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-04 03:19:36

Turns out this does not cleanly back port to 1.3.8. Best to just check the current compiler warnings and do a fresh patch for 1.3.8.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-08 21:15:07

d5bc4cf10..63336d203 389-ds-base-1.3.8 -> 389-ds-base-1.3.8

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-08 21:15:13

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-11 22:56:55

It looks like you backported the fixes, then updated to 1.4.0.10 - should we expect an updated 1.3.8.x package to make it's way to Fedora 27 eventually, or a 1.4.x package, or ... ?

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-11 23:03:22

Sorry, I just saw it in updates-testing. I didn't think to check there at first. Looking forward to it clearing testing so we can move forward with putting Pis into production, but I am going to go ahead and actually try that package (in place of my compiled version) on our testing Pi ASAP :)

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-11 23:04:50

Sorry, I just saw it in updates-testing. I didn't think to check there at first. Looking forward to it clearing testing so we can move forward with putting Pis into production, but I am going to go ahead and actually try that package (in place of my compiled version) on our testing Pi ASAP :)

:thumbsup:

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-12 01:06:38

Well, it died further than the non-testing version, but unlike my compiled version did die during replica setup ... Not sure if the problem is in 389-ds-base or elsewhere (since I uninstalled all the IPA packages and reinstalled them with testing repo enabled, so some things are testing and others aren't). slapd doesn't crash, it's just an error loading an LDIF now?

Starting replication, please wait until this has completed.
Update in progress, 6 seconds elapsed
Update succeeded

  [29/41]: prevent time skew after initial replication
  [30/41]: adding sasl mappings to the directory
  [31/41]: updating schema
  [32/41]: setting Auto Member configuration
  [33/41]: enabling S4U2Proxy delegation
  [34/41]: initializing group membership
  [35/41]: adding master entry
ipaserver.install.service: CRITICAL Failed to load master-entry.ldif: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
  [error] CalledProcessError: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

ipapython.admintool: ERROR    Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
ipapython.admintool: ERROR    The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

The log file:

adding new entry "cn=ipa-11.creatuity.internal,cn=masters,cn=ipa,cn=etc,dc=creatuity,dc=internal"

2018-06-11T22:57:12Z DEBUG stderr=ldap_initialize( ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket/??base )
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
ldap_add: Already exists (68)

2018-06-11T22:57:12Z CRITICAL Failed to load master-entry.ldif: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
2018-06-11T22:57:12Z DEBUG Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step
    method()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 725, in __add_master_entry
    self._ldap_mod("master-entry.ldif", self.sub_dict)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 309, in _ldap_mod
    ipautil.run(args, nolog=nologlist)
  File "/usr/lib/python3.6/site-packages/ipapython/ipautil.py", line 561, in run
    raise CalledProcessError(p.returncode, arg_string, str(output))
subprocess.CalledProcessError: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.

2018-06-11T22:57:12Z DEBUG   [error] CalledProcessError: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
2018-06-11T22:57:12Z DEBUG Destroyed connection context.ldap2_2992086640
2018-06-11T22:57:12Z DEBUG Backing up system configuration file '/etc/ipa/default.conf'
2018-06-11T22:57:12Z DEBUG Saving Index File to '/var/lib/ipa/sysrestore/sysrestore.index'
2018-06-11T22:57:12Z DEBUG   File "/usr/lib/python3.6/site-packages/ipapython/admintool.py", line 174, in execute
    return_value = self.run()
  File "/usr/lib/python3.6/site-packages/ipapython/install/cli.py", line 319, in run
    cfgr.run()
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 364, in run
    self.execute()
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 388, in execute
    for _nothing in self._executor():
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner
    exc_handler(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception
    self._handle_exception(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner
    step()
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda>
    step = lambda: next(self.__gen)
  File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
    six.reraise(*exc_info)
  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
    value = gen.send(prev_value)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 654, in _configure
    next(executor)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner
    exc_handler(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception
    self._handle_exception(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 517, in _handle_exception
    self.__parent._handle_exception(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 514, in _handle_exception
    super(ComponentBase, self)._handle_exception(exc_info)
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner
    step()
  File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda>
    step = lambda: next(self.__gen)
  File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
    six.reraise(*exc_info)
  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
    value = gen.send(prev_value)
  File "/usr/lib/python3.6/site-packages/ipapython/install/common.py", line 66, in _install
    for unused in self._installer(self.parent):
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/__init__.py", line 622, in main
    replica_install(self)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 388, in decorated
    func(installer)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 1407, in install
    pkcs12_info=dirsrv_pkcs12_info)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 110, in install_replica_ds
    setup_pkinit=not options.no_pkinit,
  File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 419, in create_replica
    self.start_creation(runtime=30)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step
    method()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 725, in __add_master_entry
    self._ldap_mod("master-entry.ldif", self.sub_dict)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 309, in _ldap_mod
    ipautil.run(args, nolog=nologlist)
  File "/usr/lib/python3.6/site-packages/ipapython/ipautil.py", line 561, in run
    raise CalledProcessError(p.returncode, arg_string, str(output))

2018-06-11T22:57:12Z DEBUG The ipa-replica-install command failed, exception: CalledProcessError: Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
2018-06-11T22:57:12Z ERROR Command '/usr/bin/ldapmodify -v -f /tmp/tmp7yy2lljn -H ldapi://%2Fvar%2Frun%2Fslapd-CREATUITY-INTERNAL.socket -Y EXTERNAL' returned non-zero exit status 68.
2018-06-11T22:57:12Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

Perhaps the problem is the master wasn't fully removed before hand... though I did ipa-server-install --uninstall and didn't get any errors during removal.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-12 01:17:45

Error 68 means you tried adding an entry that already existed in your database. Its certainly not a bug. I don't see how that could be related to anything in 389-ds-base unless the entry actually does NOT exist. You could verify by looking in the DS access log (/var/log/dirsrv/slapd-YOUR_INSTANCE/access) and find the dn of the entry that triggered the "err=68". Then do a search on the database for that entry. If its not there then we have a serious bug. If it is there, then it looks like its an issue on the IPA install (really in this case an error 68 can probably be ignored).

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-12 02:27:47

I tried uninstalling and reinstalling all the IPA related packages (with also running ipa-replica-manage del --force --cleanup ipa-11.creatuity.internal on the other masters/replicas), using only the stable packages except for 389-ds-base (which needs to be testing) and it's still crashing (apparently - at same point of replica setup) due to replication, even though it shouldn't be.

Tomorrow I'm going to just reinstall Fedora on the Pi again to start from a clean slate. That worked before ...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-12 04:14:41

If it is crashing, can you get a core file/stack trace of the crash? Thanks!

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-06-13 23:27:40

I installed a fresh copy of Fedora 27, installed regular (not testing) FreeIPA packages, then installed just the 389-ds-core update-testing package. I was able to setup the replica without any issues.

The other ARM issue I'm aware of, https://pagure.io/freeipa/issue/7337 , still applies, the workaround (from that issue) of GssapiDelegCcacheEnvVar KRB5CCNAME to the /ipa section of /etc/httpd/conf.d/ipa.conf still works, though that's not really a fix exactly.

389-ds-bot commented 4 years ago

Comment from biosehnsucht at 2018-07-13 20:06:25

An FYI update.

Previously I was testing F27 armhfp.

I've just tested in F28 aarch64 and neither this issue nor the other ARM issue (which previously we had to work around with the changes to /etc/httpd/conf.d/ipa.conf) occur. I don't remember seeing aarch64 available on F27 so I never tried it, but it seems that either the other bug with KRB5CCNAME setting only affects armhfp or it was fixed between F27 and F28.