Tokutek / mongo

TokuMX is a high-performance, concurrent, compressing, drop-in replacement engine for MongoDB | Issue tracker: https://tokutek.atlassian.net/browse/MX/ |
http://www.tokutek.com/products/tokumx-for-mongodb/
704 stars 97 forks source link

Replica set with 2 masters? #1206

Closed vishnevskiy closed 9 years ago

vishnevskiy commented 9 years ago

I was trying to debug why a cron wasn't updating the database only to find out that there are 2 masters.

This is the output of the most up to date master.

{
    "date" : ISODate("2014-10-22T18:33:35Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 0,
            "name" : "linode-prod-mongodb-0002:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1206877,
            "optimeDate" : ISODate("2014-10-22T18:33:18.310Z"),
            "lastGTID" : "GTID(1, 2025861)",
            "lastUnappliedGTID" : "GTID(1, 2025861)",
            "minLiveGTID" : "GTID(1, 2025862)",
            "minUnappliedGTID" : "GTID(1, 2025862)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "linode-prod-mongodb-0001:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 531,
            "optimeDate" : ISODate("2014-10-22T18:33:33.854Z"),
            "lastGTID" : "GTID(2, 1652770)",
            "lastUnappliedGTID" : "GTID(2, 1652770)",
            "minLiveGTID" : "GTID(2, 1652771)",
            "minUnappliedGTID" : "GTID(2, 1652771)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "lastHeartbeat" : ISODate("2014-10-22T18:33:34Z"),
            "lastHeartbeatRecv" : ISODate("2014-10-22T18:33:34Z"),
            "pingMs" : 0,
            "syncingTo" : "linode-prod-mongodb-0003:27017"
        },
        {
            "_id" : 2,
            "name" : "linode-prod-mongodb-0003:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1206859,
            "optimeDate" : ISODate("2014-10-22T18:33:35.117Z"),
            "lastGTID" : "GTID(2, 1652773)",
            "lastUnappliedGTID" : "GTID(2, 1652773)",
            "minLiveGTID" : "GTID(2, 1652774)",
            "minUnappliedGTID" : "GTID(2, 1652774)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "lastHeartbeat" : ISODate("2014-10-22T18:33:35Z"),
            "lastHeartbeatRecv" : ISODate("2014-10-22T18:33:34Z"),
            "pingMs" : 0,
            "syncingTo" : "linode-prod-mongodb-0002:27017"
        }
    ],
    "ok" : 1
}

And this is from the other master.

{
    "date" : ISODate("2014-10-22T18:33:35Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 0,
            "name" : "linode-prod-mongodb-0002:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1206877,
            "optimeDate" : ISODate("2014-10-22T18:33:18.310Z"),
            "lastGTID" : "GTID(1, 2025861)",
            "lastUnappliedGTID" : "GTID(1, 2025861)",
            "minLiveGTID" : "GTID(1, 2025862)",
            "minUnappliedGTID" : "GTID(1, 2025862)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "linode-prod-mongodb-0001:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 531,
            "optimeDate" : ISODate("2014-10-22T18:33:33.854Z"),
            "lastGTID" : "GTID(2, 1652770)",
            "lastUnappliedGTID" : "GTID(2, 1652770)",
            "minLiveGTID" : "GTID(2, 1652771)",
            "minUnappliedGTID" : "GTID(2, 1652771)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "lastHeartbeat" : ISODate("2014-10-22T18:33:34Z"),
            "lastHeartbeatRecv" : ISODate("2014-10-22T18:33:34Z"),
            "pingMs" : 0,
            "syncingTo" : "linode-prod-mongodb-0003:27017"
        },
        {
            "_id" : 2,
            "name" : "linode-prod-mongodb-0003:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1206859,
            "optimeDate" : ISODate("2014-10-22T18:33:35.117Z"),
            "lastGTID" : "GTID(2, 1652773)",
            "lastUnappliedGTID" : "GTID(2, 1652773)",
            "minLiveGTID" : "GTID(2, 1652774)",
            "minUnappliedGTID" : "GTID(2, 1652774)",
            "oplogVersion" : 4,
            "highestKnownPrimaryInReplSet" : 2,
            "lastHeartbeat" : ISODate("2014-10-22T18:33:35Z"),
            "lastHeartbeatRecv" : ISODate("2014-10-22T18:33:34Z"),
            "pingMs" : 0,
            "syncingTo" : "linode-prod-mongodb-0002:27017"
        }
    ],
    "ok" : 1
}

This is the version installed on all boxes

{
    "version" : "2.4.10",
    "tokumxVersion" : "2.0.0",
    "gitVersion" : "c7f2e017eb71d93ca51d5073eb1570f6c9ce0ba1",
    "tokukvVersion" : "668f1118593ba0976b6ec68768062f64d418ec83",
    "sysInfo" : "Linux 25f03aee86ad 3.11.0-26-generic #45-Ubuntu SMP Tue Jul 15 04:02:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux BOOST_LIB_VERSION=1_48",
    "loaderFlags" : " -Wl,-Bsymbolic-functions -Wl,-z,relro   -Wl,-Bsymbolic-functions -Wl,-z,relro   ",
    "compilerFlags" : "-fPIC -fno-strict-aliasing -ggdb -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -pipe -Wnon-virtual-dtor -Woverloaded-virtual -Wno-unused-local-typedefs -fno-builtin-memcmp -O3",
    "versionArray" : [
        2,
        4,
        10,
        0
    ],
    "javascriptEngine" : "V8",
    "bits" : 64,
    "debug" : false,
    "maxBsonObjectSize" : 16777216,
    "ok" : 1
}
zkasheff commented 9 years ago

First off, we are now tracking issues through https://tokutek.atlassian.net/browse/MX. This looks like https://tokutek.atlassian.net/browse/MX-1308, which I'm not sure what is happening. Can you attach a gdb to the old primary, linode-prod-mongodb-0002:27017 and email the output of `gdb --batch -ex "set pagination 0" -ex "thread apply all bt" $MONGOD_PID to the tokumx google group?

Before doing so, you'll need to install debug info. Assuming you are running 2.0 community from a tarball, please download and install https://s3.amazonaws.com/tokumx-2.0.0/tokumx-2.0.0-linux-x86_64-debuginfo.tar.gz in the same directory you have tokumx installed.

vishnevskiy commented 9 years ago

Sorry about that I will post in JIRA next time.

That said I already did an initial resync =(

zkasheff commented 9 years ago

Thanks. That said, you are the second to hit this, so there is likely something wrong. I believe the original primary is stalling underneath the covers. If you hit this again and can get gdb stack traces, that will help greatly. Thanks.