Closed kforner closed 6 years ago
Hi @kforner, have you exported the port 389 when running the container? You can see here the list of other ports required from FreeIPA.
HTH
Hi @felipevolpone Yes, it is started like this:
docker run --name $(REPLICA_NAME) -d \
--restart=always \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-e IPA_SERVER_IP=$(REPLICA_IP) \
--dns $(MASTER_IP) \
-h $(REPLICA_HOST).$(DOMAIN) \
-v $(PWD)/ipa-data:/data \
-v /etc/localtime:/etc/localtime:ro \
-p 53:53 -p 53:53/udp \
-p 80:80 -p 443:443 \
-p 389:389 -p 636:636 -p 88:88 -p 464:464 \
-p 88:88/udp -p 464:464/udp -p 7389:7389 \
-p 9443:9443 -p 9444:9444 -p 9445:9445 $(IMAGE)
@kforner Sorry, I have not formulated my question properly. Have you also ran the master exposing the ports?
Yes:
9b9584c17685 docker.quartzbio.com/freeipa-server:fedora23-freeipa4.2.3-systemd "/usr/sbin/init-data" 2 weeks ago Up 2 weeks 0.0.0.0:53->53/tcp, 0.0.0.0:80->80/tcp, 0.0.0.0:53->53/udp, 0.0.0.0:88->88/tcp, 0.0.0.0:389->389/tcp, 0.0.0.0:88->88/udp, 0.0.0.0:443->443/tcp, 0.0.0.0:464->464/tcp, 0.0.0.0:636->636/tcp, 0.0.0.0:7389->7389/tcp, 0.0.0.0:9443-9445->9443-9445/tcp, 0.0.0.0:464->464/udp, 123/udp freeipa
I have another replica running for years, and I'm trying to setup another one.
@kforner could you share ipaupgrade.log?
a better way to include the log: ipareplica-install.txt
@kforner Hi, it's seems that is an issue with IPv6. So, try again enabling IPv6: https://docs.docker.com/engine/userguide/networking/default_network/ipv6/
@felipevolpone I manually tried running docker like this:
sudo dockerd --ipv6 --fixed-cidr-v6="2001:db8:1::/64"
I retried the replica install, with the same result:
ipa.ipapython.ipaldap.SchemaCache: DEBUG flushing ldapi://%2fvar%2frun%2fslapd-QUARTZBIO-COM.socket from SchemaCache
ipa.ipapython.ipaldap.SchemaCache: DEBUG retrieving schema for SchemaCache url=ldapi://%2fvar%2frun%2fslapd-QUARTZBIO-COM.socket conn=<ldap.ldapobject.SimpleLDAPObject instance at 0x7f804f98e518>
ipa : DEBUG Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 449, in start_creation
run_step(full_msg, method)
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 439, in run_step
method()
File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 431, in __setup_replica
r_bindpw=self.dm_password)
File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", line 1068, in setup_replication
raise RuntimeError("Failed to start replication")
RuntimeError: Failed to start replication
Hi, it's not the same. Before it was failing at wait_for_open_ports by timeout. Now you have a replication failure. We should inspect the logs of directory server, master and replica.
@germanparente oh, thank you.
On the master I found this, at the exact same time as the above error (retrieving schema for SchemaCache) in /var/log/dirsrv/slapd-QUARTZBIO-COM/errors :
[03/Aug/2017:18:52:15 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[03/Aug/2017:18:52:15 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[03/Aug/2017:18:52:15 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[03/Aug/2017:18:52:15 +0200] NSMMReplicationPlugin - agmt="cn=meToiparig.quartzbio.com" (iparig:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't conta
ct LDAP server) ()
[03/Aug/2017:18:52:15 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[03/Aug/2017:18:52:15 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[03/Aug/2017:18:52:15 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[03/Aug/2017:18:52:15 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 1 seconds.
[03/Aug/2017:18:52:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[03/Aug/2017:18:52:16 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 2 seconds.
on the replica server (after rerunning the replica install), I found this in the dirsrv logs about database generation ID:
Aug 04 10:31:46 iparig.quartzbio.com ns-slapd[480]: [04/Aug/2017:10:31:46.464741513 +0200] ipaenrollment_start - [file ipa_enrollment.c, line 408]: Failed to get default realm?!
Aug 04 10:31:46 iparig.quartzbio.com ns-slapd[480]: [04/Aug/2017:10:31:46.494741725 +0200] slapd started. Listening on All Interfaces port 389 for LDAP requests
Aug 04 10:31:46 iparig.quartzbio.com ns-slapd[480]: [04/Aug/2017:10:31:46.500172010 +0200] Listening on All Interfaces port 636 for LDAPS requests
Aug 04 10:31:46 iparig.quartzbio.com ns-slapd[480]: [04/Aug/2017:10:31:46.503584141 +0200] Listening on /var/run/slapd-QUARTZBIO-COM.socket for LDAPI requests
Aug 04 10:31:46 iparig.quartzbio.com systemd[1]: Started 389 Directory Server QUARTZBIO-COM..
Aug 04 10:31:47 iparig.quartzbio.com ns-slapd[480]: [04/Aug/2017:10:31:47.803910268 +0200] NSMMReplicationPlugin - agmt="cn=meToipa.quartzbio.com" (ipa:389): The remote replica has a dif
ferent database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica.
Hi,
I have a guess here. You are re-installing replica several times without completely cleaning from the master.
If you do
"ipa-server-find" in master, do you see your replica ?
If you see it, you need to do in master:
ipa server-del
Then, let's see also:
"ipa-replica-manage list-ruv"
If you see records showing your replica, you will need to do:
ipa-replica-manage clean-ruv
These steps have to be done each time your replica install fails, unfortunately. Because when you install your replica, some changes are also done at master side configuration that have to be cleaned.
Regards,
German
I don't see my replica in ipa-server-find:
# ipa server-find
---------------------
2 IPA servers matched
---------------------
Server name: ipa.quartzbio.com
Min domain level: 0
Max domain level: 0
Server name: ipasif2.quartzbio.com
Managed suffix: dc=quartzbio,dc=com, o=ipaca
Min domain level: 0
Max domain level: 1
but there are some zombie "ruv"
# ipa-replica-manage list-ruv
iparig.quartzbio.com:389: 16
ipa.quartzbio.com:389: 4
iparig.quartzbio.com:389: 17
iparig.quartzbio.com:389: 18
iparig.quartzbio.com:389: 12
iparig.quartzbio.com:389: 14
ipasif2.quartzbio.com:389: 11
iparig.quartzbio.com:389: 19
iparig.quartzbio.com:389: 15
I've run ipa-replica-manage clean-ruv xx for xx in 12:19, but I don't know how much time it takes to get done by the background tasks.
Hi,
you could tail -f /var/log/dirsrv/slapd-QUARTZBIO-COM/errors To see if the ruv has been cleaned.
Or retry the list-ruv command.
regards.
@germanparente tail -f /var/log/dirsrv/slapd-QUARTZBIO-COM/errors
[04/Aug/2017:14:58:23 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 18): Replica not online (agmt="cn=meToiparig.quartzbio.com" (iparig:389))
[04/Aug/2017:14:58:23 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 18): Not all replicas online, retrying in 640 seconds...
[04/Aug/2017:14:58:24 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:14:58:41 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:14:58:41 +0200] NSMMReplicationPlugin - agmt="cn=meToiparig.quartzbio.com" (iparig:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
It seems that the RUV cleanup needs to contact the replica, so it nevers succeeds...
Do you have both ipa.quartzbio.com and ipasif2.quartzbio.com running ? The CLEANALLRUV operation is sent to all the nodes.
@germanparente Yes, both are running. ipa-replica-manage list-ruv still lists the same ruvs, i.e. nothing has been deleted.
The CLEANALLRUV operation is sent to all the nodes.
Yes, but this message: Not all replicas online, retrying in 640 seconds...
makes me think that it will never happen.
Ok. Let's try another method.
First, let's see that there's no clean-ruv tasks running:
ipa-replica-manage list-clean-ruv
If there are., please do:
ipa-replica-manage abort-clean-ruv
Then, let's do the cleanallruv like this, in the master:
ldapmodify -a -D "cn=directory manager" -W << EOF dn: cn=clean 19, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: dc=quartzbio,dc=com replica-id: 19 replica-force-cleaning: yes cn: clean 19 EOF
and see after a moment if the id 19 is not there in "list-ruv" command. Then repeat for the other invalid ruvs.
Yes there were some clean-ruv tasks:
# ipa-replica-manage list-clean-ruv
CLEANALLRUV tasks
RID 12: Not all replicas online, retrying in 14400 seconds...
RID 14: Not all replicas online, retrying in 1280 seconds...
RID 15: Not all replicas online, retrying in 1280 seconds...
RID 16: Not all replicas online, retrying in 14400 seconds...
RID 17: Not all replicas online, retrying in 1280 seconds...
RID 18: Not all replicas online, retrying in 1280 seconds...
RID 19: Not all replicas online, retrying in 1280 seconds...
No abort CLEANALLRUV tasks running
After running a bunch of ipa-replica-manage abort-clean-ruv xxx
, and waiting, I finally got:
# ipa-replica-manage list-clean-ruv
No CLEANALLRUV tasks running
Abort CLEANALLRUV tasks
RID 12: Retrying in 320 seconds
RID 14: Retrying in 320 seconds
RID 15: Retrying in 160 seconds
RID 16: Retrying in 160 seconds
RID 17: Retrying in 160 seconds
RID 18: Retrying in 160 seconds
RID 19: Retrying in 160 seconds
I tried the ldapmodify command, and got this in the log:
[04/Aug/2017:15:29:05 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Initiating CleanAllRUV Task...
[04/Aug/2017:15:29:05 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Retrieving maxcsn...
[04/Aug/2017:15:29:05 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - agmt="cn=meToiparig.quartzbio.com" (iparig:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Found maxcsn (00000000000000000000)
[04/Aug/2017:15:29:06 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Cleaning rid (19)...
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Waiting to process all the updates from the deleted replica...
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Waiting for all the replicas to be online...
[04/Aug/2017:15:29:06 +0200] NSMMReplicationPlugin - CleanAllRUV Task (rid 19): Task aborted for rid(19).
and the ruv 19 is still listed by ipa-replica-manage list-ruv
.
are you sure that both replicas are running and the replication between them is working ? If you add a user in one of them, you can see it in the other ?
We could try a clean ruv only in both replicas. The operation will be like this:
ldapmodify -x -D "cn=directory manager" -W <<EOF dn: cn=replica,cn=dc\3Dquartzbio\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task: CLEANRUV19 EOF
In both replicas.
@germanparente
are you sure that both replicas are running and the replication between them is working ?
There's only one replica (ipasif2), or do you also count the master ?
If you add a user in one of them, you can see it in the other ?
Damnation: you're right, it's not working any more. I even tried ipa-replica-manage force-sync --from ipa.quartzbio.com
on the replica, but it won't sync. It's a nightmare...
We could try a clean ruv only in both replicas.
On my replica (that does not replicate anymore),
# ipa-replica-manage list-ruv
ipa.quartzbio.com:389: 4
ipasif2.quartzbio.com:389: 11
What a nightmare... Isn't there a simple way to come clean ? I'd like to start with fresh, up-to-date freeipa-server container, but of course to keep my user data. Isn't there a way to export the data and reimport it ?
That's why. Let's re-initialize that replica from the master.
ipa-replica-manage re-initialize --from ipa.quartzbio.com
thanks for your patience.
Yes, there is, of course a way to import, export. But your master is working fine. the re-initialize is similar to re-import db in your replica (on-line).
@germanparente
ipa-replica-manage re-initialize --from ipa.quartzbio.com
It worked ! I can now see my new user, and it it working both ways. What a relief !! Thanks !!!
thanks for your patience.
You're kidding right ? Thanks a lot for your invaluable help !!!
Nice ! Thanks to you. Let's re-try the CLEANALLRUV. I would use the ldapmodify version. And let's do it one by one.
So now, on the master:
# ipa-replica-manage list-clean-ruv
No CLEANALLRUV tasks running
No abort CLEANALLRUV tasks running
but
# ipa-replica-manage list-ruv
iparig.quartzbio.com:389: 16
iparig.quartzbio.com:389: 17
ipa.quartzbio.com:389: 4
iparig.quartzbio.com:389: 18
iparig.quartzbio.com:389: 12
iparig.quartzbio.com:389: 14
ipasif2.quartzbio.com:389: 11
iparig.quartzbio.com:389: 19
iparig.quartzbio.com:389: 15
and the same on the replica
I would use the ldapmodify version. And let's do it one by one.
Ok, I could delete 2 ruvs, the 19 and 18 from the master, but they still appear on the replica in ipa-replica-manage list-ruv
Should I do it also with the ldapmodify on the replica ?
But have you used this one ?
ldapmodify -a -D "cn=directory manager" -W << EOF dn: cn=clean 19, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: dc=quartzbio,dc=com replica-id: 19 replica-force-cleaning: yes cn: clean 19 EOF
If you have used the task, then it has to be done in all the nodes. But I advise to use the former one. Because the replica could send the ruvs again to the master.
Yes, I used the bad one. Then fixed it on the replica, then used the correct ldapmodify. Now both servers are clean:
# ipa-replica-manage list-ruv
ipa.quartzbio.com:389: 4
ipasif2.quartzbio.com:389: 11
Should I try again to add the new replica ?
yes. But just as a remark, ipv6 should be enabled also in master. Not for replication but later in the install it could fail if not enabled.
I tried, and it failed again at the same step:
2017-08-04T15:17:55Z DEBUG retrieving schema for SchemaCache url=ldapi://%2fvar%2frun%2fslapd-QUARTZBIO-COM.socket conn=<ldap.ldapobject.SimpleLDAPObject instance at 0x7f4d79a0c638>
2017-08-04T15:18:12Z DEBUG Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 449, in start_creation
run_step(full_msg, method)
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 439, in run_step
method()
File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 431, in __setup_replica
r_bindpw=self.dm_password)
File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", line 1068, in setup_replication
raise RuntimeError("Failed to start replication")
RuntimeError: Failed to start replication
2017-08-04T15:18:12Z DEBUG [error] RuntimeError: Failed to start replication
2017-08-04T15:18:12Z DEBUG Destroyed connection context.ldap2_139970749192016
2017-08-04T15:18:12Z DEBUG File "/usr/lib/python2.7/site-packages/ipapython/admintool.py", line 172, in execute
return_value = self.run()
File "/usr/lib/python2.7/site-packages/ipapython/install/cli.py", line 318, in run
cfgr.run()
File "/usr/lib/python2.7/site-packages/ipapython/install/core.py", line 310, in run
At the same time, on the master, in /var/log/dirsrv/slapd-QUARTZBIO-COM/errors:
[04/Aug/2017:17:17:55 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[04/Aug/2017:17:17:55 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:17:55 +0200] NSMMReplicationPlugin - agmt="cn=meToiparig.quartzbio.com" (iparig:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[04/Aug/2017:17:17:55 +0200] get_dom_sid - [file ipa_sidgen_common.c, line 75]: Internal search failed.
[04/Aug/2017:17:17:55 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:17:55 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:17:55 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 1 seconds.
[04/Aug/2017:17:17:56 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:17:56 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 2 seconds.
[04/Aug/2017:17:17:58 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:17:58 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 3 seconds.
[04/Aug/2017:17:18:01 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:18:01 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 4 seconds.
[04/Aug/2017:17:18:05 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
[04/Aug/2017:17:18:05 +0200] NSMMReplicationPlugin - Warning: unable to acquire replica for total update, error: -1, retrying in 5 seconds.
I am sure you have checked that 389/636 ports are open.
if you try in master:
kinit -kt /etc/dirsrv/ds.keytab ldap/hostname
ldapsearch -h
does this work ? same from the replica to master.
on master:
# seems to work
kinit -kt /etc/dirsrv/ds.keytab ldap/ipa.quartzbio.com
# does not
kinit -kt /etc/dirsrv/ds.keytab ldap/ipasif2.quartzbio.com
kinit: Keytab contains no suitable keys for ldap/ipasif2.quartzbio.com@QUARTZBIO.COM while getting initial credentials
on replica (ipasif2) it is the opposite:
[root@ipasif2 /]# kinit -kt /etc/dirsrv/ds.keytab ldap/ipa.quartzbio.com
kinit: Keytab contains no suitable keys for ldap/ipa.quartzbio.com@QUARTZBIO.COM while getting initial credentials
I was not sure about what you meant, I tried this: on master:
%ldapsearch -h ipa -p 389 -Y GSSAPI -b "" -s base -W
->lots of output, last lines:
vendorVersion: 389-Directory/1.3.4.5 B2015.323.048
dataversion: 020170731153946020170731153946020170731153946
netscapemdsuffix: cn=ldap://dc=ipa,dc=quartzbio,dc=com:389
lastusn: 3719059
changeLog: cn=changelog
firstchangenumber: 879865
lastchangenumber: 881801
%ldapsearch -h ipasif2 -p 389 -Y GSSAPI -b "" -s base -W
...
vendorVersion: 389-Directory/1.3.4.5 B2015.323.048
dataversion: 020170804135304020170804135304020170804135304
netscapemdsuffix: cn=ldap://dc=ipasif2,dc=quartzbio,dc=com:389
lastusn: 3458032
changeLog: cn=changelog
firstchangenumber: 766007
lastchangenumber: 767549
same results on on replica
And from the master to iparig ?
-h iparig -p 389
And from the master to iparig ?
but iparig is the name of the freeipa inside the docker (running on rig), which failed, so how can it work ?
ldapsearch -h iparig -p 389 -Y GSSAPI -b "" -s base -W
Enter LDAP Password:
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
From master,
openssl s_client -connect iparig:389
Hi, I thought iparig was the name of the new node that was going to be installed. I had seen iparig.quartzbio.com
@germanparente
I thought iparig was the name of the new node that was going to be installed.
Yes it is. To be run in a docker on a computer named rig. The master runs in a docker: ipa.quartzbio.com
The existing replica, ipasif2, runs also in a docker on a computer named sif.
Ok. Not sure i am following completely.
In "ipa-replica-manage list" I see ipasif2 and this is working fine.
So, from ipa node, we should be able to do:
openssl s_client -connect iparig:389
And from rig, we should be able to do the same against ipa:389
Regards
From inside the freeipa-server container running on rig (which has no services running since the replica failed):
%openssl s_client -connect ipa.quartzbio.com:389
CONNECTED(00000003)
140092481591160:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 201 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : 0000
Session-ID:
Session-ID-ctx:
Master-Key:
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
Start Time: 1502097998
Timeout : 300 (sec)
Verify return code: 0 (ok)
But from the master (heimdall, running ipa.quartzbio.com in a container):
%openssl s_client -connect iparig.quartzbio.com:389
getaddrinfo: Name or service not known
From what I understand, this is expected since the name "iparig.quartzbio.com" does not yet exist since the replica failed.
openssl s_client -connect rig:389
socket: Bad file descriptor
connect:errno=9
This rig:389 is bound the 389 port of the docker container, which does NOT run the dirsrv instance since the replica failed.
Hi,
I don't understand well how for the master ipa, we can see a replica iparig but it's not resolvable from it.
You have two nodes already installed that are working fine:
ipa and ipasif2
so, from ipa you can ping ipasif2 and connect through 389 port, for instance and the opposite is also valid.
This same configuration should be followed between ipa and rig (or iparig).
Thanks.
German.
Fixed by merging issue 151 in issue 150
@germanparente @felipevolpone I just realized that I mixed up my replies from another issue: https://github.com/freeipa/freeipa-container/issues/151 . Let me clarify: I tried two different things,
- upgrading my existing replica to a newer version by running it with the latest version of the freeipa-server docker container: issue https://github.com/freeipa/freeipa-container/issues/150
- adding a new replica (using also the latest version of the freeipa-server container): issue #151 .
So all the posts, command outputs, log excerpts that I posted in this issue (issue #150) actually belong to issue #151. I am very sorry for the confusion, but also hope that it could also explain why it is so difficult to fix.
any insights on how to move on ? Would it be easier to just create a new master by exporting/importing the data ? What would be the best reliable deployment: to use fedora and freeipa from rpm ?
@kforner the situation is little bit confusing now, the log you provide is about this error or about the issue 151?
@felipevolpone
the situation is little bit confusing now, the log you provide is about this error or about the issue 151?
Yes I know and I'm sorry for that. All replies in this issue #150 actually belong to issue #151. That's why there are three freeIPA: the master, the existing replica (ipasif2), and the new replica (iparig) that I can not manage to setup. Maybe I could close the existing issues and start with a new one ?
So, just change the title and the description of this issue. I'll close #151, ok?
Now, back to the problem. Have you tried to run docker run again after all the tests you and @germanparente have done? Could you share the output/log?
So, just change the title and the description of this issue. I'll close #151, ok?
done
Have you tried to run docker run again after all the tests you and @germanparente have done?
Yes.
It still fails at the same step:
2017-08-04T15:17:55Z DEBUG flushing ldapi://%2fvar%2frun%2fslapd-QUARTZBIO-COM.socket from SchemaCache
2017-08-04T15:17:55Z DEBUG retrieving schema for SchemaCache url=ldapi://%2fvar%2frun%2fslapd-QUARTZBIO-COM.socket conn=<ldap.ldapobject.SimpleLDAPObject instance
at 0x7f4d79a0c638>
2017-08-04T15:18:12Z DEBUG Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 449, in start_creation
run_step(full_msg, method)
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 439, in run_step
method()
File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 431, in __setup_replica
r_bindpw=self.dm_password)
File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", line 1068, in setup_replication
raise RuntimeError("Failed to start replication")
RuntimeError: Failed to start replication
2017-08-04T15:18:12Z DEBUG [error] RuntimeError: Failed to start replication
Correct me if i am wrong but master ipa and replica sif2 are working fine.
Let's go on installing the third node. The problem here is that we don't manage to connect to port 388 from ipa to iparig. Could we check that again ?
Correct me if i am wrong but master ipa and replica sif2 are working fine.
Yes
The problem here is that we don't manage to connect to port 388 from ipa to iparig. Could we check that again ?
This is the part I'm not sure to understand: since the replica install failed, there is no service running in the freeipa docker on rig, hence nothing to contact on port 389, right ?
I'm still stuck with my master and its replica running an old version of the freeipa-server docker (adelton/freeipa-server:latest-systemd). I was trying to setup a new replica using the latest freeipa/freeipa-server.
I had to add the
--skip-conncheck
option toipa-data/ipa-replica-install-options
because there is no ssh installed in the docker.But it still fails at some point. Here are the IMHO relevant lines from
/var/log/ipareplica-install.log
, out of ~ 2000 lines: