Closed kinjelom closed 3 years ago
I'm a little confused about step 6.
You mention that you get back two IPs, A/0
and B/0
... however you also mention:
a) Deleting the A/0
VM
b) Sometimes one of the B
instances uses A/0
's IP address.
Some questions:
1) dig
in step 6 returns two A
records, yes?
2) If it does return two A
records, are the IPs different?
3) If two A
records are returned, how do you know that one corresponds to A/0
and the other B/0
?
@klakin-pivotal sorry, my mistake - now I have corrected the description of steps 5 and 6.
It really happened and one Postgres Haproxy was redirecting to the nodes from different cluster/deployment. Unfortunately, despite many attempts, I was unable to recreate this situation.
Sound like the instance from which the dig
was performed was not yet updated with the latest DNS entries. These are synced by the sync-dns
process (logs: /var/vcap/sys/log/director/sync_dns.stdout.log). Maybe that process was having problems at the time our your test. Normally it should sync dns entries every 10 seconds. So with old state bosh-dns would just perform a local health check to B/0 and think A/0 is back.
Point 4. could be explained by bosh-dns removing A/0
from its response because a local failed health check.
Since the issue is not reproducible I'm gonna close this issue.
Steps to reproduce (works only in some cases):
A
with 2 nodes:A/0
,A/1
dig q-s0.mg.default.a.bosh
- two ipsA/0
dig q-s0.mg.default.a.bosh
- one ipB
with 2 nodes:B/0
,B/1
- in some cases deployment B uses ip of deleted VMA/0
=B/0
dig q-s0.mg.default.a.bosh
- two ips:A/0
=B/0
,A/1
I don't know where the bug is: bosh-dns, bosh-director or maybe bosh-cpi for OpenStack.
I think that bosh-dns should not consider as healthy machines that responds from other deployment that being queried.