Closed sberryman closed 7 years ago
Yeah if you dig into your logs there's a DNS lookup for consul-mdb.svc.{{ACCOUNT_ID}}.us-sw-1.cns.joyent.com
happening, which looks to be mismatched with this line of your Compose file.
This means none of your mongo instances are finding Consul, so they can't do service discovery.
I just masked out the account id. When I actually ran the compose file I used the FQDN. If you look at the logs you can see it actually picked up and registered with the consul server. It is creating the mongodb-primary
key and lockset just fine.
It seems to be having some trouble reaching Consul here https://gist.github.com/sberryman/0c61b319e57de014fc43e8af414a14ee#file-mongodb_1-log-L158 but I guess it looks like it recovers later on. We keep trying to create the session though and getting errors later down the road. This doesn't seem like the correct behavior but I'll admit it's been a bit since I've looked at this code.
It also looks like there's a stack trace being dropped in the log for the on_change
handler:
2017/04/04 17:42:48 2017-04-04 17:42:48,315 INFO manage.py Function failed on_change
2017/04/04 17:42:48 msg = self.format(record)
2017/04/04 17:42:48 File "/usr/lib/python2.7/logging/__init__.py", line 732, in format
2017/04/04 17:42:48 return fmt.format(record)
2017/04/04 17:42:48 File "/usr/lib/python2.7/logging/__init__.py", line 471, in format
2017/04/04 17:42:48 record.message = record.getMessage()
2017/04/04 17:42:48 File "/usr/lib/python2.7/logging/__init__.py", line 335, in getMessage
2017/04/04 17:42:48 msg = msg % self.args
2017/04/04 17:42:48 TypeError: not all arguments converted during string formatting
2017/04/04 17:42:48 Logged from file manage.py, line 222
Which is here https://github.com/autopilotpattern/mongodb/blob/master/bin/manage.py#L222
try:
repl_status = local_mongo.admin.command('replSetGetStatus')
is_mongo_primary = repl_status['myState'] == 1
# ref https://docs.mongodb.com/manual/reference/replica-states/
except Exception as e:
log.error(e, 'unable to get primary status')
return False
That's not valid Python for the log bit, so we need to fix that but I don't think that's the problem either.
I'm not sure where this error message is coming from, as that log string doesn't appear in the code as far as I can tell. What version are you using?
Haha I noticed that error as well and couldn't find it in the project. "DEBUG manage.py no replset config has been received"
I tried using autopilotpattern/mongodb:latest
as well as pulling down the repo, building from master. Then I just tested upgrading containerpilot to 2.7.2 as I saw you pushed that today. Containerpilot version didn't see to do anything. I'm using the global _/consul:0.7.5
but I've also tried with autopilotpattern/consul:0.7.2-r0.8
So I just tried to modify the manage.py file and change the location of consul. I noticed on other projects you have the concept of CONSUL_AGENT=1
which doesn't exist as part of this example. So then I realized that the agent coprocess has been added but the manage.py doesn't point to localhost:8500 consul, it points to whatever is specified as the env var for CONSUL
. Changing to localhost:8500
doesn't fix it.
Everything seems to be syncing to consul just fine though:
After some google searching it turns out no replset config has been received
error is coming from MongoDB itself. Most of the errors look like they are related to hostname/port combinations.
I just tried upgrading the python modules with no luck either:
PyMongo: 3.2.2 -> 3.4.0
python-Consul: 0.4.7 -> 0.7.0
It seems to be failing at repl_status = local_mongo.admin.command('replSetGetStatus')
https://github.com/autopilotpattern/mongodb/blob/master/bin/manage.py#L110
Update: I've updated the gist to add two new files.
At this point I believe you are correct and the logging error isn't causing any issues.
It seems like GET /v1/agent/services
doesn't appear to be returning the correct services.
https://github.com/autopilotpattern/mongodb/blob/master/bin/manage.py#L281-L283
docker exec -it mdbtest_mongodb_1 bash
curl http://localhost:8500/v1/agent/services
{
"mongodb-replicaset-8921a0c3da47": {
"ID": "mongodb-replicaset-8921a0c3da47",
"Service": "mongodb-replicaset",
"Tags": null,
"Address": "192.168.128.68",
"Port": 27017,
"EnableTagOverride": false,
"CreateIndex": 0,
"ModifyIndex": 0
}
}
You mention using consul.agent.health()
instead which when testing against the consul local agent on the primary node I get:
curl http://localhost:8500/v1/health/service/mongodb-replicaset?passing=true
[{
"Node": {
"ID": "",
"Node": "588d353877e6",
"Address": "192.168.128.69",
"TaggedAddresses": {
"lan": "192.168.128.69",
"wan": "192.168.128.69"
},
"Meta": null,
"CreateIndex": 119,
"ModifyIndex": 218
},
"Service": {
"ID": "mongodb-replicaset-588d353877e6",
"Service": "mongodb-replicaset",
"Tags": null,
"Address": "192.168.128.69",
"Port": 27017,
"EnableTagOverride": false,
"CreateIndex": 124,
"ModifyIndex": 127
},
"Checks": [{
"Node": "588d353877e6",
"CheckID": "mongodb-replicaset-588d353877e6",
"Name": "mongodb-replicaset-588d353877e6",
"Status": "passing",
"Notes": "TTL for mongodb-replicaset set by containerpilot",
"Output": "ok",
"ServiceID": "mongodb-replicaset-588d353877e6",
"ServiceName": "mongodb-replicaset",
"CreateIndex": 126,
"ModifyIndex": 127
}, {
"Node": "588d353877e6",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"CreateIndex": 119,
"ModifyIndex": 119
}]
}, {
"Node": {
"ID": "",
"Node": "8921a0c3da47",
"Address": "192.168.128.68",
"TaggedAddresses": {
"lan": "192.168.128.68",
"wan": "192.168.128.68"
},
"Meta": null,
"CreateIndex": 73,
"ModifyIndex": 222
},
"Service": {
"ID": "mongodb-replicaset-8921a0c3da47",
"Service": "mongodb-replicaset",
"Tags": null,
"Address": "192.168.128.68",
"Port": 27017,
"EnableTagOverride": false,
"CreateIndex": 77,
"ModifyIndex": 80
},
"Checks": [{
"Node": "8921a0c3da47",
"CheckID": "mongodb-replicaset-8921a0c3da47",
"Name": "mongodb-replicaset-8921a0c3da47",
"Status": "passing",
"Notes": "TTL for mongodb-replicaset set by containerpilot",
"Output": "ok",
"ServiceID": "mongodb-replicaset-8921a0c3da47",
"ServiceName": "mongodb-replicaset",
"CreateIndex": 79,
"ModifyIndex": 80
}, {
"Node": "8921a0c3da47",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"CreateIndex": 73,
"ModifyIndex": 73
}]
}]
Which clearly gives me BOTH nodes. This has been like searching a needle in a haystack but a good opportunity to see how this all works. I'm going to take a break and then try and tackle the change over to using health.
Not sure how anyone is using this on Joyent as is right now though...
@tgross need me to make any changes to the PR #13?
So I have MongoDB replicasets running locally on Docker for Mac without using
network: bridge
however, when running on Joyent I can't seem to get the replica to connect and sync with the primary.docker-compose -p mdb_test -f docker-compose.yml up -d
docker logs mdbtest_mongodb_1
(see gist)mongodb-primary
key and lock session in consulrs.status()
on the only running mongodb node (see gist)docker-compose -p mdb_test -f docker-compose.yml scale mongodb=2
rs.status()
on the primary again. (see gist)docker ps
:I've tried with and without exposing any ports publicly and with and without using the consul agent. Only thing I can think of at this point is possibly around DNS. I see the redis autopilot pattern is using a very different method to get the IP address of the container.
MongoDB: https://github.com/autopilotpattern/mongodb/blob/master/bin/manage.py#L420-L430 Redis: https://github.com/autopilotpattern/redis/blob/master/bin/manage.sh#L311-L313
I figure I have to be doing something wrong as nobody else has mentioned a problem...