389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
212 stars 91 forks source link

LDAPS replication fails to init with invalid function argument #6365

Open dwbotsch opened 5 days ago

dwbotsch commented 5 days ago

Installed on RHEL8 (fully updated) two servers. Initially using default self signed certs. Tried manually generating new cert and ca cert and didn't change a thing.

When attempting to initialize replication... the last init shows: error (-1) LDAP error: can't contact LDAP server - no response received

after manually verifying that one can indeed get from one server to the other via port 636 (telnet/etc... so not a firewall issue).

In Monitoring - error logs (in the cockpit ui console), each server basically has the following error:

[17/Oct/2024:17:02:36.610202255 -0400] - ERR - slapi_ldap_bind - Could not send bind request for id [cn=replication manager,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 0 (Unknown error, host "XXXXXXXX:636")

"Invalid function argument" would seem to be a red flag.

Turning on additional debugging to trace function calls...

on the server I clicked initialize on:

Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.609375635 -0400] - DEBUG - plugin_call_func - Calling plugin 'Multisupplier replication internal postoperation plugin' #1 type 521 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.626058940 -0400] - DEBUG - send_ldap_result_ext - => 0:: Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.642708426 -0400] - DEBUG - slapi_control_present - => (looking for 1.3.6.1.1.13.1) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.659383732 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.676049177 -0400] - DEBUG - slapi_control_present - => (looking for 1.3.6.1.1.13.2) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.692719943 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.709391199 -0400] - DEBUG - slapi_control_present - => (looking for 2.16.840.1.113730.3.4.12) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.726070735 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.742728310 -0400] - DEBUG - slapi_control_present - => (looking for 2.16.840.1.113730.3.4.18) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.759383886 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.776066852 -0400] - DEBUG - compute_limits - => sizelimit=-1, timelimit=-1 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.792774559 -0400] - DEBUG - plugin_call_func - Calling plugin 'Account Usability Plugin' #0 type 403 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.809417513 -0400] - DEBUG - account-usability-plugin - --> auc_pre_search Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.826080979 -0400] - DEBUG - account-usability-plugin - <-- auc_pre_op Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.842742105 -0400] - DEBUG - plugin_call_func - Calling plugin 'ACL preoperation' #1 type 403 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.859435022 -0400] - DEBUG - slapi_control_present - => (looking for 2.16.840.1.113730.3.4.12) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.876111017 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.892745762 -0400] - DEBUG - slapi_control_present - => (looking for 2.16.840.1.113730.3.4.18) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.909439839 -0400] - DEBUG - slapi_control_present - <= 0 (NO CONTROLS) Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.926099253 -0400] - DEBUG - plugin_call_func - Calling plugin 'deref' #2 type 403 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.942752660 -0400] - DEBUG - deref-plugin - --> deref_pre_search Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.959423536 -0400] - DEBUG - deref-plugin - <-- deref_pre_search Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.976087910 -0400] - DEBUG - plugin_call_func - Calling plugin 'Multisupplier replication preoperation plugin' #3 type 403 Oct 17 17:21:29 pickle ns-slapd[2536377]: [17/Oct/2024:17:21:29.992774187 -0400] - ERR - slapi_ldap_bind - Could not send bind request for id [cn=replication manager,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 0 (Unknown error, host "cucumber:636")

thanks

tbordaz commented 4 days ago

Are you using client cert auth ?. did you try nsds5ReplicaBindMethod: SSLCLIENTAUTH ?

progier389 commented 4 days ago

According to the error message and the port number it is attempting a simple authentication over ldaps. IMHO something looks wrong with the SSL connection (missing certificate or maybe some trust issue ... ) May also be that /tmp is not private and security is not properly enabled (but in that case I would expect seeing an error message about it in error log at startup)

dwbotsch commented 1 day ago

I'm in the process of installing production SSL Certs (which I need to do anyway, soo might as well) to eliminate self signed certs (though one would think if that's the out of box config...) as the issue... then can test with both replication and with ldapsearch from the cmdline between servers.

dwbotsch commented 23 hours ago

Using production certs now issued from a 3rd party CA and imported into both servers. Verified certs work via ldapsearch both directions (though... 389ds is using nss, sooo...). I've seen a bit implying that the invalid function argument has something to do with a conflict between nssdb and openssl... thoughts?

some additional info: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/thread/IZTF7HJTS23SKOY22L3ZTWCI3ZXDWV46/

[root@pickle ~]# ldd /usr/sbin/ns-slapd |grep -i nss libnss3.so => /lib64/libnss3.so (0x00007f94fd890000) libnssutil3.so => /lib64/libnssutil3.so (0x00007f94fd65c000) [root@pickle ~]# [root@pickle ~]# [root@pickle ~]# ldd /usr/sbin/ns-slapd |grep -i ssl libssl3.so => /lib64/libssl3.so (0x00007f2f5954e000) libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f2f58b19000)

dwbotsch commented 23 hours ago

well... got it working. Documentation improvement notes: 1) replication won't work "out of the box" using the generated self signed certs from whenst doing the install 2) when using production certs from a real CA... each certificate in the signing chain has to be individually uploaded... doesn't deal with a single chained CA file 3) ssl client auth doesn't seem to work... dunno. Even though the flags are there per redhat's docummentat ("u,u,u" on the server cert and "CT," on each of the intemediate and root CA certs) . Simple authN over ldaps does.