alibaba / MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog. Redundant replication and active-active replication are two most important functions. 基于mongodb oplog的集群复制工具,可以满足迁移和同步的需求,进一步实现灾备和多活功能。
GNU General Public License v3.0
1.72k stars 441 forks source link

Failed to target upsert by query :: could not extract exact shard key #380

Closed dishytianxiang closed 3 years ago

dishytianxiang commented 4 years ago

背景:两个分片集群同步,源是3.4.2版本,目的是4.2.0版本 问题:incr同步过程中一直报Failed to target upsert by query :: could not extract exact shard key这个错误

vinllen commented 4 years ago

你启用了upsert吧?分片集群由于目的端写入mongos,而源端同步过来是不带shard key的,导致会报错。建议关闭upsert

vinllen commented 4 years ago

优化:后续会考虑统一采用change stream,插入会携带shard key,以避免这个问题,不过短期内不会优化

vinllen commented 4 years ago

https://github.com/alibaba/MongoShake/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98&FAQ#q-%E6%8A%A5%E9%94%99failed-to-target-upsert-by-query--could-not-extract-exact-shard-key

vinllen commented 4 years ago

upsert恰好更新shard key的话,需要启用retryWrites,但是目前mgo的driver还不支持。但是另外一方面来说,upsert更新了shard key,产生的oplog可能是一个普通的Update,也可能是一个delete+insert:

更新shard key(需要driver支持retryWrites,但是目前mgo还不支持),会产生2条oplog,删除旧的,已经插入新的:
{ "txnNumber" : NumberLong(0), "lsid" : { "id" : UUID("f9acab90-fa2b-4862-8fb2-7786b622ecb5"), "uid" : BinData(0,"Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=") }, "_id" : { "_data" : "825F93B25A000000042B022C0100296E5A10048748C70ABEF5443D9D1A2B5768F7D259461E6A002D03C47E645F696400645F93B221AA9C0A0710E5DAB30004", "_typeBits" : BinData(0,"QA==") }, "operationType" : "delete", "clusterTime" : Timestamp(1603514970, 4), "ns" : { "db" : "test", "coll" : "c30" }, "documentKey" : { "j" : 123455, "_id" : ObjectId("5f93b221aa9c0a0710e5dab3") } }
{ "txnNumber" : NumberLong(0), "lsid" : { "id" : UUID("f9acab90-fa2b-4862-8fb2-7786b622ecb5"), "uid" : BinData(0,"Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=") }, "_id" : { "_data" : "825F93B25A000000042B022C0100296E5A10048748C70ABEF5443D9D1A2B5768F7D259461E6A002D03C480645F696400645F93B221AA9C0A0710E5DAB30004", "_typeBits" : BinData(0,"QA==") }, "operationType" : "insert", "clusterTime" : Timestamp(1603514970, 4), "fullDocument" : { "_id" : ObjectId("5f93b221aa9c0a0710e5dab3"), "j" : 123456, "m" : 5 }, "ns" : { "db" : "test", "coll" : "c30" }, "documentKey" : { "j" : 123456, "_id" : ObjectId("5f93b221aa9c0a0710e5dab3") } }

对于这种情况,天然可以处理。 但也有可能就是一个replace,对于这种情况目前还没办法处理,需要更新mgo驱动到mongo-go-driver才能解决:

{ "_id" : { "_data" : "825F93CED2000000012B022C0100296E5A10048E79C981E1C84460A188D0758E376225461E78002B141E5F6964002B040004", "_typeBits" : BinData(0,"gkAB") }, "operationType" : "replace", "clusterTime" : Timestamp(1603522258, 1), "fullDocument" : { "_id" : 2, "x" : 100 }, "ns" : { "db" : "writer_test", "coll" : "a" }, "documentKey" : { "x" : 10, "_id" : 2 } }
vinllen commented 4 years ago

计划2.4.17版本发布,上面存在的问题需要等待mgo驱动的更新才可以完全解决。 已知报错:

[2020/10/24 14:45:51 CST] [CRIT] Replayer-1, executor-1, oplog for namespace[writer_test.a] op[u] failed. error type[*mgo.BulkError] error[index[0], msg[Must run update to shard key field in a multi-statement transaction or with retryWrites: true.], dup[false]], logs number[1], firstLog: {"ts":6887074316488278017,"op":"u","ns":"writer_test.a","o":[{"Name":"_id","Value":1},{"Name":"x","Value":4}],"o2":{"_id":1,"x":1},"documentKey":{"_id":1,"x":1}}