alibaba / MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog. Redundant replication and active-active replication are two most important functions. 基于mongodb oplog的集群复制工具,可以满足迁移和同步的需求,进一步实现灾备和多活功能。
GNU General Public License v3.0
1.72k stars 441 forks source link

shard to shard 全量(document)问题 #207

Closed jackyu86 closed 5 years ago

jackyu86 commented 5 years ago

启动后创建了库表后无响应了 image

配置如下

分片的所有节点

mongo_urls = mongodb://192.168.1.107:29010,192.168.1.108:29010,192.168.1.109:29010;mongodb://192.168.1.107:29020,192.168.1.108:29020,192.168.1.109:29020;mongodb://192.168.1.107:29030,192.168.1.108:29030,192.168.1.109:29030; collector.id = mongoshake checkpoint.interval = 5000 mongo_connect_mode = secondaryPreferred log_level = debug log_file = collector.log log_buffer = true filter.namespace.black = filter.namespace.white = sync_mode = document oplog.gids = syncer.reader.buffer_time = 1 http_profile = 9100 system_profile = 9200 shard_key = collection worker = 3 worker.batch_queue_size = 64 adaptive.batching_max_size = 16384 fetcher.buffer_capacity = 256 worker.oplog_compressor = none # tunnel = direct

目标的mongos

tunnel.address = mongodb://192.168.1.115:29050,192.168.1.116:29050,192.168.1.117:29050; context.address = ckpt_default context.storage = database

源的config节点

context.storage.url=mongodb://192.168.1.107:29040,192.168.1.108:29040,192.168.1.109:29040 context.start_position = 2000-01-01T00:00:01Z master_quorum = false replayer.dml_only = true replayer.executor = 1 replayer.executor.upsert = false replayer.executor.insert_on_dup_update = false replayer.conflict_write_to = none replayer.durable = true

jackyu86 commented 5 years ago

[root@node1 mongo-shake-2.0.5]# ./collector version improve-2.0.5,6ea4b1077730d3d08d69484f7174c258a070b9b9,release,go1.10.3,2019-07-29_21:17:51

vinllen commented 5 years ago

查看下目的端是否有数据写入

jackyu86 commented 5 years ago

没有数据写入

jackyu86 commented 5 years ago

sync_mode 设置成all或oplog是有数据写入的

vinllen commented 5 years ago

document是全量同步,同步完了不会进入增量同步,会直接退出。

jackyu86 commented 5 years ago

我测试的现象是,在目标创建了库表,然后就停止了,没有进行document的全量同步 同步的日志 image 源数据量 image 目标数据量 image

MaxLinyun commented 5 years ago

@jackyu86 源库里是不是有些db的表个数太多了?你每个db执行show tables,时间长吗

MaxLinyun commented 5 years ago

目的库不需要创建表,空实例做全量同步就行了

jackyu86 commented 5 years ago

@lydarkforest 目的库的表是mongoshake自己生成的,源库的db只有一个表,在测试mongoshake

jackyu86 commented 5 years ago

我刚才故意把集群oplog配置成1M sync_mode 配置成all,也出现了同步开始就关闭的问题,之前oplog配置的10g,应该就是全量不能使用的问题了吧。

MaxLinyun commented 5 years ago

全量同步的时候 看下目的端mongodb的运行日志有没有什么报错信息

vinllen commented 5 years ago

oplog 1M应该太小了,可能有oplog表淘汰的压力大导致写入block的问题,建议查看下目的mongodb的运行日志。

jackyu86 commented 5 years ago

好的好的,今日休战了,改日再战,周末快乐

jackyu86 commented 5 years ago

各位大神今天测试了下: mongos 2019-08-05T14:40:45.525+0800 I SH_REFR [ConfigServerCatalogCacheLoader-12] Refresh for database stormtest took 2 ms and failed :: caused by :: NamespaceNotFound: database stormtest not found 同步时候出现这么个异常是什么原因

shard: Refresh for collection test_yu.jack1 took 2 ms and found the collection is not sharded

jackyu86 commented 5 years ago

源库这样操作的分片和索引 sh.enableSharding("stormtest") db.test.ensureIndex({"ss_id":1}) db.runCommand({"shardCollection":"stormtest.test","key":{"ss_id":"hashed"}})

vinllen commented 5 years ago

你这个stormtest db不存在吧,刷新一下路由看下,是不是写入后又drop了