apolloconfig / apollo

Apollo is a reliable configuration management system suitable for microservice configuration management scenarios.
https://www.apolloconfig.com
Apache License 2.0
28.94k stars 10.19k forks source link

OceanBase 兼容问题 #5087

Closed yangbnh closed 3 months ago

yangbnh commented 5 months ago

OceanBase数据库,

表主键id的自增策略如下:

租户切主时,插入的下一个值就是之前自增列缓存的最大值,auto_increment_cache_size 是用户设置自增的缓存个数,这个变量默认是1000000;建表插入数据后,查看表结构时当前AUTO_INCREMENT = 1000001(这跟分布式自增列的实现有关系),未指定id插入数据时,auto_increment_increment 自增步长默认为1,id列自动填充;当切主时,自增列缓存会变成2000001

目前,通过项目中原始ddl插入表中,业务运行一段时间后生成的Id如下(抱歉,模糊了部分业务编码):

图片

然后数据库监控 ocp监控,发现long sql,cpu耗尽,具体sql参见

select releasemes0_.Id as id115, releasemes0_.DataChange_LastTime as datachan215, releasemes0_.Message as message315 from ReleaseMessage releasemes0 where releasemes0.Id in (2097184 , 2097185 , 2097186 , 2097187 , 2097188 , 2097189 , 2097190 , 2097191 , 2097192 , 2097193 , 2097194 , 2097195 , 2097196 , 2097197 , 2097198 , 2097199 , 2097200 , 2097201 , 2097202 , 2097203 , 2097204 , 2097205 , 2097206 , 2097207 , 2097208 , 2097209 , 2097210 , 2097211 , 2097212 , 2097213 , 2097214 , 2097215 , 2097152 , 2097153 , 2097154 , 2097155 , 2097156 , 2097157 , 2097158 , 2097159 , 2097160 , 2097161 , 2097162 , 2097163 , 2097164 , 2097165 , 2097166 , 2097167 , 2097168 , 2097169 , 2097170 , 2097171 , 2097172 , 2097173 , 2097174 , 2097175 , 2097176 , 2097177 , 2097178 , 2097179 , 2097180 , 2097181 , 2097182 , 2097183 , 2097248 , 2097249 , 2097250 , 2097251 , 2097252 , 2097253 , 2097254 , 2097255 , 2097256 , 2097257 , 2097258 , 2097259 , 2097260 , 2097261 , 2097262 , 2097263 , 2097264 , 2097265 , 2097266 , 2097267 , 2097268 , 2097269 , 2097270 , 2097271 , 2097272 , 2097273 , 2097274 , 2097275 , 2097276 , 2097277 , 2097278 , 2097279 , 2097216 , 2097217 , 2097218 , 2097219 , 2097220 , 2097221 , 2097222 , 2097223 , 2097224 , 2097225 , 2097226 , 2097227 , 2097228 , 2097229 , 2097230 , 2097231 , 2097232 , 2097233 , 2097234 , 2097235 , 2097236 , 2097237 , 2097238 , 2097239 , 2097240 , 2097241 , 2097242 , 2097243 , 2097244 , 2097245 , 2097246 , 2097247 , 2097312 , 2097313 , 2097314 , 2097315 , 2097316 , 2097317 , 2097318 , 2097319 , 2097320 , 2097321 , 2097322 , 2097323 , 2097324 , 2097325 , 2097326 , 2097327 , 2097328 , 2097329 , 2097330 , 2097331 , 2097332 , 2097333 , 2097334 , 2097335 , 2097336 , 2097337 , 2097338 , 2097339 , 2097340 , 2097341 , 2097342 , 2097343 , 2097280 , 2097281 , 2097282 , 2097283 , 2097284 , 2097285 , 2097286 , 2097287 , 2097288 , 2097289 , 2097290 , 2097291 , 2097292 , 2097293 , 2097294 , 2097295 , 2097296 , 2097297 , 2097298 , 2097299 , 2097300 , 2097301 , 2097302 , 2097303 , 2097304 , 2097305 , 2097306 , 2097307 , 2097308 , 2097309 , 2097310 , 2097311 , 2097376 , 2097377 , 2097378 , 2097379 , 2097380 , 2097381 , 2097382 , 2097383 , 2097384 , 2097385 , 2097386 , 2097387 , 2097388 , 2097389 , 2097390 , 2097391 , 2097392 , 2097393 , 2097394 , 2097395 , 2097396 , 2097397 , 2097398 , 2097399 , 2097400 , 2097401 , 2097402 , 2097403 , 2097404 , 2097405 , 2097406 , 2097407 , 2097344 , 2097345 , 2097346 , 2097347 , 2097348 , 2097349 , 2097350 , 2097351 , 2097352 , 2097353 , 2097354 , 2097355 , 2097356 , 2097357 , 2097358 , 2097359 , 2097360 , 2097361 , 2097362 , 2097363 , 2097364 , 2097365 , 2097366 , 2097367 , 2097368 , 2097369 , 2097370 , 2097371 , 2097372 , 2097373 , 2097374 , 2097375 , 2097440 , 2097441 , 2097442 , 2097443 , 2097444 , 2097445 , 2097446 , 2097447 , 2097448 , 2097449 , 2097450 , 2097451 , 2097452 , 2097453 , 2097454 , 2097455 , 2097456 , 2097457 , 2097458 , 2097459 , 2097460 , 2097461 , 2097462 , 2097463 , 2097464 , 2097465 , 2097466 , 2097467 , 2097468 , 2097469 , 2097470 , 2097471 , 2097408 , 2097409 , 2097410 , 2097411 , 2097412 , 2097413 , 2097414 , 2097415 , 2097416 , 2097417 , 2097418 , 2097419 , 2097420 , 2097421 , 2097422 , 2097423 , 2097424 , 2097425 , 2097426 , 2097427 , 2097428 , 2097429 , 2097430 , 2097431 , 2097432 , 2097433 , 2097434 , 2097435 , 2097436 , 2097437 , 2097438 , 2097439 , 2097504 , 2097505 , 2097506 , 2097507 , 2097508 , 2097509 , 2097510 , 2097511 , 2097512 , 2097513 , 2097514 , 2097515 , 2097516 , 2097517 , 2097518 , 2097519 , 2097520 , 2097521 , 2097522 , 2097523 , 2097524 , 2097525 , 2097526 , 2097527 , 2097528 , 2097529 , 2097530 , 2097531 , 2097532 , 2097533 , 2097534 , 2097535 , 2097472 , 2097473 , 2097474 , 2097475 , 2097476 , 2097477 , 2097478 , 2097479 , 2097480 , 2097481 , 2097482 , 2097483 , 2097484 , 2097485 , 2097486 , 2097487 , 2097488 , 2097489 , 2097490 , 2097491 , 2097492 , 2097493 , 2097494 , 2097495 , 2097496 , 2097497 , 2097498 , 2097499 , 2097500 , 2097501 , 2097502 , 2097503 , 2097568 , 2097569 , 2097570 , 2097571 , 2097572 , 2097573 , 2097574 , 2097575 , 2097576 , 2097577 , 2097578 , 2097579 , 2097580 , 2097581 , 2097582 , 2097583 , 2097584 , 2097585 , 2097586 , 2097587 , 2097588 , 2097589 , 2097590 , 2097591 , 2097592 , 2097593 , 2097594 , 2097595 , 2097596 , 2097597 , 2097598 , 2097599 , 2097536 , 2097537 , 2097538 , 2097539 , 2097540 , 2097541 , 2097542 , 2097543 , 2097544 , 2097545 , 2097546 , 2097547 , 2097548 , 2097549 , 2097550 , 2097551 , 2097552 , 2097553 , 2097554 , 2097555 , 2097556 , 2097557 , 2097558 , 2097559 , 2097560 , 2097561 , 2097562 , 2097563 , 2097564 , 2097565 , 2097566 , 2097567 , ...

in的id值太多,不能一一列举.

此问题可复现,在docker部署和集群部署都可复现,麻烦解决之道.不胜感激

dosubot[bot] commented 5 months ago

🤖

你好 @yangbnh!我是 Dosu,一个可以帮助你解决问题、回答问题并成为贡献者的机器人。我在这里帮助你,在等待人类维护者的同时,请放心,我会立即查看你的问题,并尽快给你一个完整的回答。

yangbnh commented 5 months ago

考虑限制ob auto_increment_cache_size的大小,来约束ReleaseMessage的id大小,从而减小此sql的id值范围

想问下是否还有其他的解决方式?

dosubot[bot] commented 5 months ago

🤖

您的问题主要集中在OceanBase数据库的自增ID策略和Apollo的ReleaseMessage表的交互方式上。在Apollo中,ReleaseMessage表的Id字段被定义为AUTO_INCREMENT,这意味着数据库会自动为每个新记录生成一个唯一值,该值是最后使用的值的增量。这是在Apollo的SQL脚本中定义的。

然而,您提到OceanBase的自增ID策略可能会导致ID值的大幅度增长,这可能会影响到某些SQL查询的性能,特别是当查询涉及到大量的ID值时。您提出的解决方案是限制OceanBase的auto_increment_cache_size的大小,以减小ReleaseMessage的ID大小,从而减小此SQL的ID值范围。

这个解决方案看起来是可行的,因为它直接解决了问题的根源,即ID值的大幅度增长。然而,这可能需要您直接在OceanBase数据库上进行配置更改,这可能会影响到其他使用该数据库的应用。

另一种可能的解决方案是优化Apollo中处理ReleaseMessage表的代码,特别是那些涉及到大量ID值的查询。例如,可以考虑使用分页查询来替代一次性查询大量ID值,或者使用其他的数据结构或算法来优化查询性能。

然而,这些解决方案可能需要对Apollo的代码进行一些修改,这可能需要一些时间和资源。在进行这些修改之前,建议先进行一些性能测试,以确定最优的解决方案。

希望这些信息对您有所帮助。如果您有任何其他问题或需要更多的信息,请随时提问。

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nobodyiam commented 5 months ago
  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。
  2. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

https://github.com/apolloconfig/apollo/blob/921af84678cdbea1715d8f7e2244a5db1288ac4e/apollo-biz/src/main/java/com/ctrip/framework/apollo/biz/repository/ReleaseMessageRepository.java#L31-L32

yangbnh commented 4 months ago
  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。

    1. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

https://github.com/apolloconfig/apollo/blob/921af84678cdbea1715d8f7e2244a5db1288ac4e/apollo-biz/src/main/java/com/ctrip/framework/apollo/biz/repository/ReleaseMessageRepository.java#L31-L32

感谢回复

其实这个问题目前已经出现多次了,我是从事银行解决方案的,目前在两家城商行都被行里的DBA监控,拉小组解决问题,您说的这个方法我也看过,但是并不是,而且查询的数据量超过6537+行,如图中sql,in的分割逗号6535+,

图片

这个sql是ob的监控,直接复制出来的,而且复制不全. 而且这些数据都带有一定的规律性

yangbnh commented 4 months ago
  1. 没太明白为啥这条 sql 会导致 ob 的 cpu 耗尽,这个是用主键查询的。

    1. 另外 in 的 id 值有多少个?看代码没有找到直接查询的地方,唯一有相关性的可能是下面这个 findFirst500ByIdGreaterThanOrderByIdAsc,一次性加载了 500 条记录,压力也不大的。

https://github.com/apolloconfig/apollo/blob/921af84678cdbea1715d8f7e2244a5db1288ac4e/apollo-biz/src/main/java/com/ctrip/framework/apollo/biz/repository/ReleaseMessageRepository.java#L31-L32

`com.ctrip.framework.apollo.biz.message.ReleaseMessageScanner.recordMissingReleaseMessageIds()方法中,

图片

会出现,id以这种规律出现,并put到missingReleaseMessages的map中.

在com.ctrip.framework.apollo.biz.message.ReleaseMessageScanner.scanMissingMessages()方法中,会取出missingReleaseMessages的keyset,索引全库数据.

图片

如图所示

yangbnh commented 4 months ago

https://github.com/apolloconfig/apollo/blob/921af84678cdbea1715d8f7e2244a5db1288ac4e/apollo-biz/src/main/java/com/ctrip/framework/apollo/biz/message/ReleaseMessageScanner.java#L129-L138

https://github.com/apolloconfig/apollo/blob/921af84678cdbea1715d8f7e2244a5db1288ac4e/apollo-biz/src/main/java/com/ctrip/framework/apollo/biz/message/ReleaseMessageScanner.java#L153-L163

问题应该就出现在 if (currentId - startId > 1) { for (long i = startId + 1; i < currentId; i++) { missingReleaseMessages.putIfAbsent(i, 1); } } 如果想mysql或者mariadb,在auto_increment并且step为1下, 没什么问题,但是ob下如果切主,会导致id以上万的step增长,这块逻辑就会产生问题

所以这块逻辑是否只需要查询startId和currentId之间,在ReleaseMessage表中真实存在的id即可,而不是通过startId++方式将所有的值都put到map中来解决问题呢?

nobodyiam commented 4 months ago

所以这块逻辑是否只需要查询startId和currentId之间,在ReleaseMessage表中真实存在的id即可

代码本意是如此,在实现上是用了一般数据库自增的特性来实现的。看来对 ob 不适用,可以看下是否有更好地方式来实现。

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] commented 3 months ago

This issue has been automatically closed because it has not had activity in the last 7 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.