389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 90 forks source link

Fedora Server 36 ARM64 Replica Segmentation fault sync_update_persist_op #5363

Closed mattymcmattface closed 2 years ago

mattymcmattface commented 2 years ago

I've never posted a bug report before, so apologies if I've got what follows wrong. I've installed a master and replica Free IPA server pair on Fedora Server 36 virtual amd64 and arm64 servers respectively. The dirsrv (ns-slapd) process on the arm64 replica coredumps after running happily for a bit with a signal 11. coredumpctl debug shows

Core was generated by `/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-NET-THE-INSTITUTE-CO-UK -i /run/dirsrv/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  sync_update_persist_op (pb=pb@entry=0xffff6aeb9b00, e=0xffff6af9d900, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, 
    label=label@entry=0xffff7f0bb008 "sync_add_persist_post_op") at ldap/servers/plugins/sync/sync_persist.c:247
Downloading 0.01 MB source file /usr/src/debug/389-ds-base-2.1.1-2.fc36.aarch64/ldap/servers/plugins/sync/sync_persist.c
247     for (curr_op = prim_op; curr_op; curr_op = curr_op->next) {

and a subsequent bt yields

#0  sync_update_persist_op
    (pb=pb@entry=0xffff6aeb9b00, e=0xffff6af9d900, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0xffff7f0bb008 "sync_add_persist_post_op") at ldap/servers/plugins/sync/sync_persist.c:247
#1  0x0000ffff7f0b8c50 in sync_add_persist_post_op (pb=0xffff6aeb9b00) at ldap/servers/plugins/sync/sync_persist.c:368
#2  sync_add_persist_post_op (pb=0xffff6aeb9b00) at ldap/servers/plugins/sync/sync_persist.c:360
#3  0x0000ffff84339780 in plugin_call_func
    (list=0xffff7f992a00, operation=operation@entry=550, pb=pb@entry=0xffff6aeb9b00, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2001
#4  0x0000ffff84339acc in plugin_call_list (pb=0xffff6aeb9b00, operation=550, list=<optimized out>) at ldap/servers/slapd/plugin.c:1944
#5  plugin_call_plugins (pb=pb@entry=0xffff6aeb9b00, whichfunction=whichfunction@entry=550) at ldap/servers/slapd/plugin.c:414
#6  0x0000ffff7e9d8990 in ldbm_back_add (pb=0xffff6aeb9b00) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1413
#7  0x0000ffff842e2fa8 in op_shared_add (pb=pb@entry=0xffff6aeb9b00) at ldap/servers/slapd/add.c:758
#8  0x0000ffff842e3d34 in do_add (pb=pb@entry=0xffff6aeb9b00) at ldap/servers/slapd/add.c:236
#9  0x0000aaaab981b9d4 in connection_dispatch_operation (pb=0xffff6aeb9b00, op=<optimized out>, conn=<optimized out>)
    at ldap/servers/slapd/connection.c:633
#10 connection_threadmain (arg=<optimized out>) at ldap/servers/slapd/connection.c:1800
#11 0x0000ffff83d2b46c in _pt_root (arg=0xffff7cfc3ec0) at ../../../../nspr/pr/src/pthreads/ptthread.c:201
#12 0x0000ffff840a09a8 in start_thread (arg=0xffffd27d5d5f) at pthread_create.c:442
#13 0x0000ffff8410bedc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

I have searched for matching bugs and solutions but have been unable to find anything that looks obviously like this issue I have attached a copy of the systemd journal for the period leading up to and during the coredump, which is at 00:46 (towards the end of the file) journal.txt .

Package Version and Platform:

Steps to Reproduce

  1. Reboot server or do an ipactl restart
  2. Wait an while (it was around 50 minutes before I wiped everything and did a fresh install. It seems to be a few hours on the fresh install. The process does start successfully, just dies a while later)
  3. Notice in logs that it has coredumped.

Expected results The ns-slapd process starts and stays running, as it does on the master running in a virtual amd64 instance.

Additional context This is running on a small home network, with kea DHCP server issuing IP addresses and doing DDNS updates to the FreeIPA bind instance. The load is relatively low as there's only about 25 devices on my network. The arm64 virtual machine is running on a RaspberryPi4 with 8GB RAM and there's a couple of other small VMs on there Th VM had 2Gig RAM and 2 processors, although having upped it to 3Gig the issue persists. The Pi is also an iSCSI target for some other devices which boot via the network. It's possible that there's IO contention on the Pi - although no smoking guns.

Hopefully this is sufficient information, but please let me know if you need anything more.

progier389 commented 2 years ago

The code at the crashing place seems OK: the for loop is walking a list of operations to duplicate some entries. And everything is anchored within the current thread, so no other threads is supposed to manipulate this list or its elements) So to get a crash here, something must happen behind the scene: I am able to list these following possible causes:

Firstyear commented 2 years ago

@progier389 arm64 has a weaker memory model than x86_64 so it could be an assumed ordering issue in the codebase. Historically, 389-ds has terrible cpu memory ordering and lack of atomics and lock so it could be related.

It could be enlightening to run asan/tsan with aarch64 in dev ....

mattymcmattface commented 2 years ago

I seem to have made the instance running on the x86_64 VM coredump in the same way, so this may not be limited to the aarch64 platform. Details, as before are:

coredumpctl debug

Core was generated by `/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-NET-THE-INSTITUTE-CO-UK -i /run/dirsrv/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  sync_update_persist_op (pb=pb@entry=0x7f7a1587e580, e=0x7f79e5177e40, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0x7f7a30bdf45e "sync_add_persist_post_op") at ldap/servers/plugins/sync/sync_persist.c:247
Downloading 0.04 MB source file /usr/src/debug/389-ds-base-2.1.1-2.fc36.x86_64/ldap/servers/plugins/sync/sync_persist.c
247         for (curr_op = prim_op; curr_op; curr_op = curr_op->next) {                                                                                                                                                                      
[Current thread is 1 (Thread 0x7f7a0edf9640 (LWP 1078))]

with a subsequent bt showing

#0  sync_update_persist_op (pb=pb@entry=0x7f7a1587e580, e=0x7f79e5177e40, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0x7f7a30bdf45e "sync_add_persist_post_op") at ldap/servers/plugins/sync/sync_persist.c:247
#1  0x00007f7a30bdccad in sync_add_persist_post_op (pb=0x7f7a1587e580) at ldap/servers/plugins/sync/sync_persist.c:368
#2  sync_add_persist_post_op (pb=0x7f7a1587e580) at ldap/servers/plugins/sync/sync_persist.c:360
#3  0x00007f7a32f34208 in plugin_call_func (list=0x7f7a2ea41a00, operation=operation@entry=550, pb=pb@entry=0x7f7a1587e580, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2001
#4  0x00007f7a32f344b7 in plugin_call_list (pb=0x7f7a1587e580, operation=550, list=<optimized out>) at ldap/servers/slapd/plugin.c:1944
#5  plugin_call_plugins (pb=pb@entry=0x7f7a1587e580, whichfunction=whichfunction@entry=550) at ldap/servers/slapd/plugin.c:414
#6  0x00007f7a2df490a3 in ldbm_back_add (pb=0x7f7a1587e580) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1413
#7  0x00007f7a32edc855 in op_shared_add (pb=pb@entry=0x7f7a1587e580) at ldap/servers/slapd/add.c:758
#8  0x00007f7a32edd739 in do_add (pb=pb@entry=0x7f7a1587e580) at ldap/servers/slapd/add.c:236
#9  0x0000564acf068d83 in connection_dispatch_operation (pb=0x7f7a1587e580, op=<optimized out>, conn=<optimized out>) at ldap/servers/slapd/connection.c:633
#10 connection_threadmain (arg=<optimized out>) at ldap/servers/slapd/connection.c:1800
#11 0x00007f7a331e8413 in _pt_root (arg=0x7f7a2c201640) at ../../../../nspr/pr/src/pthreads/ptthread.c:201
#12 0x00007f7a32a8ce2d in start_thread (arg=<optimized out>) at pthread_create.c:442
#13 0x00007f7a32b12620 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Package details from the x86_64 VM are as follows

Name         : 389-ds-base
Version      : 2.1.1
Release      : 2.fc36
Architecture : x86_64
Size         : 12 M
Source       : 389-ds-base-2.1.1-2.fc36.src.rpm
Repository   : @System
From repo    : updates

The hypervisor in both instances is the version of KVM that comes with Ubuntu 22.04LTS - which both the X86_64 and Pi4 hosts are running - could it be something it is doing?

Firstyear commented 2 years ago

@tbordaz Do we test with freeipa tests with ASAN? It looks like this might be in the sync repl updates you did recently?

mattmcmattface commented 2 years ago

Hi, I've noticed there's an updated version of the package in the repo so updated to it. I'm still seeing the coredumps on the aarch64 platform. Obviously when ns-slapd stops on this replica then the one on the x64 platform doesn't core dump as no replication is taking place. I'm guessing it's something specific to the way I've set things up or else others would be reporting it? Perhaps it is triggered by the ddns updates from the kea dhcp server via the Bind instance bundled with freeipa? Anyway:

Package Information Installed Packages Name : 389-ds-base Version : 2.1.3 Release : 2.fc36 Architecture : aarch64 Size : 13 M Source : 389-ds-base-2.1.3-2.fc36.src.rpm Repository : @System From repo : updates

Exception ``

0 sync_update_persist_op (pb=pb@entry=0xffff6ca06580, e=0xffff26d9ac40, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0xffff842bb008 "sync_add_persist_post_op")

at ldap/servers/plugins/sync/sync_persist.c:247

247 for (curr_op = prim_op; curr_op; curr_op = curr_op->next) {
``

bt ``

0 sync_update_persist_op (pb=pb@entry=0xffff6ca06580, e=0xffff26d9ac40, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0xffff842bb008 "sync_add_persist_post_op")

at ldap/servers/plugins/sync/sync_persist.c:247

1 0x0000ffff842b8c50 in sync_add_persist_post_op (pb=0xffff6ca06580) at ldap/servers/plugins/sync/sync_persist.c:368

2 sync_add_persist_post_op (pb=0xffff6ca06580) at ldap/servers/plugins/sync/sync_persist.c:360

3 0x0000ffff88f64640 in plugin_call_func (list=0xffff8471ca00, operation=operation@entry=550, pb=pb@entry=0xffff6ca06580, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2001

4 0x0000ffff88f6498c in plugin_call_list (pb=0xffff6ca06580, operation=550, list=) at ldap/servers/slapd/plugin.c:1944

5 plugin_call_plugins (pb=pb@entry=0xffff6ca06580, whichfunction=whichfunction@entry=550) at ldap/servers/slapd/plugin.c:414

6 0x0000ffff83bd8a20 in ldbm_back_add (pb=0xffff6ca06580) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1413

7 0x0000ffff88f12ac8 in op_shared_add (pb=pb@entry=0xffff6ca06580) at ldap/servers/slapd/add.c:758

8 0x0000ffff88f13854 in do_add (pb=pb@entry=0xffff6ca06580) at ldap/servers/slapd/add.c:236

9 0x0000aaaada38b9d4 in connection_dispatch_operation (pb=0xffff6ca06580, op=, conn=) at ldap/servers/slapd/connection.c:633

10 connection_threadmain (arg=) at ldap/servers/slapd/connection.c:1800

11 0x0000ffff8894b46c in _pt_root (arg=0xffff8213bb00) at ../../../../nspr/pr/src/pthreads/ptthread.c:201

12 0x0000ffff88cc09a8 in start_thread (arg=0xffffe35ed7df) at pthread_create.c:442

13 0x0000ffff88d2bd1c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

``

Since I opened this issue I've moved the tgtd (iscsi server) activity off the RaspberryPi4 that hosts the virtual machine that runs this freeipa replica, which has reduced the load on the underlying system considerably. I had speculated that there might be some IO contention that was contributing to this - but that does not appear to be the case.

I've set systemd to restart the process when it fails, so perhaps a pattern may emerge between it and the X64 replica. I'd offer to assist with the ASAN stuff mentioned above, but that's a bit beyond my skill-level so I'd need an expert to talk me through the steps invovled.

Many thanks, Matt

mattmcmattface commented 2 years ago

In my last update I shared that I'd done a "dnf update" which picked up a more recent version of 389-ds-base; set the service to restart via systemd (wait 60 seconds before doing so and give in if that happened more than 5 times in 500 seconds); and changed the DHCP DDNS update config so that only the X64 server received updates, with "freeipa" syncrepl transferring that to the AARCH64 instance. My reasoning for this last one is that this has to be something reasonably unique to my set up or more people would be seeing it. The DHCP server and both ipa instances are in the DDNS config section. I had thought that it'd try the first and only move on to the second if that update failed, but it occurrerd to me that it might actually be updating both ipa replicas and causing confusion.

I continue to see coredumps on both the X64 and AARCH64 ipa servers - so that eliminates the DDNS-double-update theory. Details of the coredumps are below and they're happening far more often on the AARCH64 ipa server, which will be receiving the DDNS updates via syncrepl from the X86 server. I will alter the DDNS config in the next couple of days so that it points at the AARCH64 ipa server instead - driving syncrepl in the other direction: to the X64 server. It will be interesting to see if the balance of coredumps shifts in the other direction. However, I'm going to hold off making that change today as, despite things not working properly this morning (resolved with a lazy reboot everything), I'm not able to explain why the coredumps appeared to stop on both servers on Friday morning.

Coredumps from the X64 instance

TIME                          PID UID GID SIG     COREFILE EXE                SIZE
Thu 2022-08-04 11:21:49 BST  2596 389 389 SIGSEGV missing  /usr/sbin/ns-slapd  n/a
Thu 2022-08-04 13:26:56 BST  4988 389 389 SIGSEGV missing  /usr/sbin/ns-slapd  n/a
Fri 2022-08-05 01:15:12 BST  6004 389 389 SIGSEGV missing  /usr/sbin/ns-slapd  n/a
Fri 2022-08-05 07:14:48 BST 10390 389 389 SIGSEGV missing  /usr/sbin/ns-slapd  n/a
Fri 2022-08-05 07:17:00 BST 12541 389 389 SIGSEGV missing  /usr/sbin/ns-slapd  n/a

Coredumps from the AARCH64 instance

TIME                          PID UID GID SIG     COREFILE EXE                  SIZE
Thu 2022-08-04 09:59:29 BST 10347 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 10:38:28 BST 45450 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 11:12:13 BST 49438 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 11:25:36 BST 50178 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 11:50:20 BST 50533 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 11:56:27 BST 51075 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 12:10:48 BST 51275 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 12:15:14 BST 51708 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 12:44:24 BST 51957 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 12:56:43 BST 52484 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 12:59:56 BST 52758 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 13:46:17 BST 52951 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 14:02:55 BST 53894 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 14:27:48 BST 54332 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 14:29:48 BST 54878 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 14:51:46 BST 55029 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 14:57:36 BST 55533 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 15:16:19 BST 55743 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 16:16:13 BST 56205 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 16:44:41 BST 57144 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 17:10:37 BST 57629 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 17:14:52 BST 58092 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 17:48:59 BST 58317 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 17:57:12 BST 58953 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 17:59:08 BST 59161 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 18:01:59 BST 59311 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 18:30:31 BST 59495 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 18:45:26 BST 60048 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 18:57:55 BST 60409 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 19:17:47 BST 60713 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 19:25:35 BST 61215 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 19:30:17 BST 61423 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 19:58:16 BST 61641 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 20:00:18 BST 62106 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 20:02:04 BST 62267 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 20:37:30 BST 62412 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 21:36:34 BST 63035 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 21:44:54 BST 64135 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Thu 2022-08-04 22:27:58 BST 64576 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 00:05:42 BST 65218 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 00:14:50 BST 66650 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 01:00:20 BST 66948 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 01:33:09 BST 67672 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 02:14:58 BST 68137 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 02:29:54 BST 68687 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 03:00:47 BST 69025 389 389 SIGSEGV missing  /usr/sbin/ns-slapd    n/a
Fri 2022-08-05 03:15:23 BST 69536 389 389 SIGSEGV present  /usr/sbin/ns-slapd  11.1M
Fri 2022-08-05 03:17:29 BST 69969 389 389 SIGSEGV present  /usr/sbin/ns-slapd   6.1M
Fri 2022-08-05 03:28:37 BST 70120 389 389 SIGSEGV present  /usr/sbin/ns-slapd   7.6M
Fri 2022-08-05 03:34:37 BST 70337 389 389 SIGSEGV present  /usr/sbin/ns-slapd   8.8M
Fri 2022-08-05 03:48:26 BST 70575 389 389 SIGSEGV present  /usr/sbin/ns-slapd  52.8M
Fri 2022-08-05 03:58:15 BST 70891 389 389 SIGSEGV present  /usr/sbin/ns-slapd  52.1M
Fri 2022-08-05 04:29:52 BST 71119 389 389 SIGSEGV present  /usr/sbin/ns-slapd  11.4M
Fri 2022-08-05 05:31:19 BST 71583 389 389 SIGSEGV present  /usr/sbin/ns-slapd  55.0M
Fri 2022-08-05 06:07:37 BST 72519 389 389 SIGSEGV present  /usr/sbin/ns-slapd  53.0M
Fri 2022-08-05 06:31:14 BST 73320 389 389 SIGSEGV present  /usr/sbin/ns-slapd  70.2M
Fri 2022-08-05 06:49:45 BST 73809 389 389 SIGSEGV present  /usr/sbin/ns-slapd  11.2M
Fri 2022-08-05 07:32:04 BST 74202 389 389 SIGSEGV present  /usr/sbin/ns-slapd  12.1M
tbordaz commented 2 years ago

The crash occurs because sync_repl fails to retrieve, in its own data structure, the primary operation (here a ADD). I reproduce a crash, but with a MOD, with the same missing primary operation. Both crash are possibly related. I have not yet opened a ticket for the crash with MOD as I am still investigating it.

tbordaz commented 2 years ago

The crash I reproduced is specific to dynamic plugins being on (nsslapd-dynamic-plugins: on). In your test, do you enable dynamic plugins ? If it is not, then I am afraid the bug I reproduced is different.

mattmcmattface commented 2 years ago

I'm using the out of the box configuration installed by Free IPA server. If you can let me know which file to check for this setting I'll let you know.

mreynolds389 commented 2 years ago

I'm using the out of the box configuration installed by Free IPA server. If you can let me know which file to check for this setting I'll let you know.

Well it's off by default, and IPA does not use it. But to confirm you can run this command:

# dsconf NET-THE-INSTITUTE-CO-UK config get nsslapd-dynamic-plugins
mattmcmattface commented 2 years ago

It looks like you've managed to reproduce a different bug. On the AARCH64 instance it is off. Running the command on the X64 instance didn't return (just hung), so I've restarted that after which it confirmed it is also off. That said - things do look to have settled down since mysteriously since Friday. Perhaps I had misbehaving device on the network generating lots of DDNS updates which is now off? Or perhaps the X64 instance has been in a zombie state since then and this morning's reboot will kick things off again? Will see what it does over the rest of today and then I'll switch the dynamic DNS updates going (via named) to the AARCH64 instance and then replicated over to the X64 one and see if the bulk of the coredumps follow there in pursuit. I really do appreciate you looking in to this - thank you.

mattymcmattface commented 2 years ago

Hi. I can confirm that changing kea dhcp server config to send DDNS updates just one of my IPA server pair has stopped this problem appearing. I can only assume that kea was attempting to update both and then the replication was getting confused. I'm happy for this issue to be closed, but am unsure of ettiquette given that a different issue has been found whilst you kindly investigated this? Shall I close?

mreynolds389 commented 2 years ago

Hi. I can confirm that changing kea dhcp server config to send DDNS updates just one of my IPA server pair has stopped this problem appearing. I can only assume that kea was attempting to update both and then the replication was getting confused. I'm happy for this issue to be closed, but am unsure of ettiquette given that a different issue has been found whilst you kindly investigated this? Shall I close?

Hey, so a new ticket was created for that other issue that we found. So this issue be closed now (with your consent).