dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.21k stars 908 forks source link

run REPLICAOF ,Segmentation fault (core dumped) #1290

Closed kissingtiger closed 1 year ago

kissingtiger commented 1 year ago

REPLICAOF 172.15.12.100 6379 I20230524 17:05:22.074071 8869 server_family.cc:2000] Replicating 172.31.128.100:6379 I20230524 17:05:22.298261 8869 replica.cc:616] Started full sync with 172.15.12.100:6379 SIGSEGV received at time=1684919123 on cpu 119 PC: @ 0x561f6b56e390 (unknown) dfly::detail::Segment<>::Bucket::FindByFp<>() Segmentation fault (core dumped)

romange commented 1 year ago

@kissingtiger does it reproduce consistently?
Do you have the core file ? Or the rest of the stack trace? Can you please attach the whole log?

kissingtiger commented 1 year ago

@romange slave run: replicaof 172.15.116.185 6379 dragonfly log output: I20230528 21:25:47.965194 22771 replica.cc:616] Started full sync with 172.15.116.185:6379 E20230528 21:25:47.974120 23053 rdb_load.cc:895] Zset ziplist integrity check failed. E20230528 21:25:48.019320 23127 rdb_load.cc:895] Zset ziplist integrity check failed. W20230528 21:25:48.046562 22771 replica.cc:267] Error syncing with 172.15.116.185:6379 dragonfly.rdbload:5 Internal error when loadB file 5 W20230528 21:25:48.547762 22771 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.685994 22678 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686005 22684 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686019 22665 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686034 22752 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686043 22711 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686035 22729 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686064 22669 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686062 22696 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686071 22714 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686061 22694 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686125 22667 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686133 22723 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686095 22689 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686242 22846 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686197 22699 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686179 22825 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686180 22837 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686115 22719 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686121 22726 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686185 22741 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686221 22906 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686223 22787 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686106 22737 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686192 22744 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686234 22828 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686144 22806 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686131 22875 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686189 22792 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686233 22856 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686172 22796 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686192 22832 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686173 22703 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686249 22910 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686260 22903 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686261 22782 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686194 22706 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686208 22887 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686100 22759 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686228 22914 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686213 22882 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686172 22762 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686241 22748 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686172 22673 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686178 22732 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686185 22686 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686235 22898 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686154 22755 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686199 22769 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.686133 22765 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.688176 22771 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.688931 22867 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689144 22778 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689170 22775 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689241 22811 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689366 22871 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689401 22858 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689478 22891 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689492 22844 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689535 22878 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689553 22893 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689575 22862 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689580 22850 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689637 22802 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 21:25:48.689664 22819 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 I20230528 21:25:48.689685 23358 replica.cc:616] Started full sync with 172.15.116.185:6379 SIGSEGV received at time=1685280565 on cpu 95 PC: @ 0x55dd09aa8d12 (unknown) dfly::StringMap::ObjEqual() Segmentation fault (core dumped)

romange commented 1 year ago

Do you think it will be possible to save the snapshot on the master using save command and share it with us @kissingtiger ?

romange commented 1 year ago

Also, where are you syncing from? Is it redis or dragonfly?

kissingtiger commented 1 year ago

@romange master and slave all dragonfly. os: Alibaba Cloud Linux release 3 (kernel 5.10.134-13.1.al8.x86_64) dragonfly version: v1.3.0-f80afca9c23e2f30373437520a162c591eaa2005 start commond: dragonfly-x86_64 --logtostderr --cache_mode=true --dbnum 16 --bind 0.0.0.0 --port 6379 --save_schedule *:30 --maxmemory=230g --keys_output_limit=12288 --dir /data/app/ --requirepass xxx --masterauth xxx

I have two questions: 1、If the number of dragonfly instance keys exceeds 10 million, restarting the dragonfly instance will also result in the same error message 2、If the number of Dragonfly master and slave instance keys is relatively small, execute the slave command from the database, and the master-slave function is normal. If the number of Dragonfly master instance keys is already 10 million, and then add the slave, it will fail

romange commented 1 year ago

What dragonfly version do you run and how many threads are on your instance?

Are you saying that you get a segfault when restarting the master without replication? Can you also attach the log file for this?

kissingtiger commented 1 year ago

dragonfly version: v1.3.0-f80afca9c23e2f30373437520a162c591eaa2005 os:cpu 128c mem 256g Running 128 io threads

kissingtiger commented 1 year ago

My machine is 128 cores, do I need to reserve some CPU? master role, run save is ok

kissingtiger commented 1 year ago

@romange Could you please help me check if there is a problem with the startup command? os: Alibaba Cloud Linux release 3 (kernel 5.10.134-13.1.al8.x86_64) 128 cores 256g memory dragonfly version: v1.3.0-f80afca9c23e2f30373437520a162c591eaa2005 start commond: dragonfly-x86_64 --logtostderr --cache_mode=true --dbnum 16 --bind 0.0.0.0 --port 6379 --save_schedule *:30 --maxmemory=230g --keys_output_limit=12288 --dir /data/app/ --requirepass xxx --masterauth xxx

romange commented 1 year ago

Your start command is good. It seems to be a problem on our side. Will you be able to hop on https://discord.gg/HsPjXGVH85 and ping me there? i am romange#0778

kissingtiger commented 1 year ago

@romange i add a slave ,logs output: I20230528 22:50:17.978691 7297 init.cc:57] /data/app/dragonfly-x86_64 running in opt mode. I20230528 22:50:17.978737 7297 dfly_main.cc:584] Starting dragonfly df-v1.3.0-f80afca9c23e2f30373437520a162c591eaa2005 I20230528 22:50:17.978791 7297 dfly_main.cc:637] Max memory limit is: 80.00GiB I20230528 22:50:17.989140 7477 uring_proactor.cc:156] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048 I20230528 22:50:18.001204 7297 proactor_pool.cc:73] Running 64 io threads I20230528 22:50:18.032270 7297 server_family.cc:166] Data directory is "/data/app" I20230528 22:50:18.032779 7309 listener_interface.cc:96] sock[131] AcceptServer - listening on port 6379 W20230528 22:50:18.089366 7312 dfly_main.cc:169] Remote version - HTTP GET error [version.dragonflydb.io:443/v1], error: 337047686 W20230528 22:50:18.089381 7312 dfly_main.cc:171] ssl error: tls_process_server_certificate/certificate verify failed

I20230528 23:04:40.528254 7302 server_family.cc:2000] Replicating 172.15.128.100:6379 W20230528 23:04:40.544265 7302 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615521 7350 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615535 7400 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615546 7356 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615552 7441 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615617 7431 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615592 7365 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615567 7339 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615597 7342 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615542 7439 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615607 7330 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615594 7413 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615613 7380 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615612 7336 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615587 7368 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615631 7422 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615636 7304 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615640 7472 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615648 7445 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615664 7447 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615665 7489 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615667 7435 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615573 7396 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615546 7453 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615587 7468 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615633 7333 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615643 7416 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615592 7377 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615589 7419 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615605 7410 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615639 7347 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615665 7355 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615586 7394 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615599 7459 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.615656 7486 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616403 7346 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616451 7371 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616465 7477 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616489 7374 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616518 7462 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616536 7360 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616567 7306 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616544 7312 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616588 7405 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616531 7428 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616536 7302 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616621 7474 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616640 7322 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616598 7409 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616565 7390 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616621 7386 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616533 7481 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616701 7383 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616552 7363 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616607 7482 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616534 7316 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616688 7466 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616582 7451 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616547 7402 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616556 7309 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616559 7325 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616585 7425 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616580 7319 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616652 7327 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.616576 7457 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.619932 7350 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.620172 7356 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.620312 7394 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.621527 7459 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.621608 7342 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.621897 7336 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622020 7439 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622185 7431 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622289 7413 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622359 7441 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622568 7365 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622879 7422 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.622928 7453 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.623158 7330 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.623294 7304 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.623591 7489 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.623663 7419 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.623929 7468 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.625402 7445 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.625907 7472 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.626195 7396 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.626550 7400 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.626703 7368 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.627152 7435 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.627200 7333 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.627406 7339 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.627575 7447 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628088 7416 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628310 7410 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628528 7486 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628593 7377 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628600 7346 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.628767 7477 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.629379 7360 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.629590 7374 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.630344 7371 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.630743 7405 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.631263 7428 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.631301 7462 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.631330 7312 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.631883 7474 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.631999 7402 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.632429 7316 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.632704 7302 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.632861 7386 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.632921 7425 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.633473 7363 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.633515 7482 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.633566 7319 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.633587 7451 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.633965 7383 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.634253 7327 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.634348 7309 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.634431 7390 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.634511 7325 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.639786 7355 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.640525 7347 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.641117 7306 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.647192 7457 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.647243 7481 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.647243 7466 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.647259 7409 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.647264 7322 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230528 23:04:40.649437 7380 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 I20230528 23:04:40.699565 7302 replica.cc:616] Started full sync with 172.15.128.100:6379 SIGSEGV received at time=1685286280 on cpu 24 PC: @ 0x5614b5449390 (unknown) dfly::detail::Segment<>::Bucket::FindByFp<>()

romange commented 1 year ago

can you please try running slave via gdb like this:

gdb --args dragonfly-x86_64 --logtostderr --cache_mode=true --dbnum 16 --bind 0.0.0.0 --port 6379 --save_schedule *:30 --maxmemory=230g --keys_output_limit=12288 --dir /data/app/ --requirepass xxx --masterauth xxx

and then try syncing it as usual until it crashes in gdb and then type bt to see the call stack. Please paste it here.

kissingtiger commented 1 year ago

@romange gdb --args dragonfly-x86_64 --logtostderr --cache_mode=true --dbnum 16 --bind 0.0.0.0 --port 6379 --save_schedule *:30 --maxmemory=80g --keys_output_limit=12288 --dir /data/app/ --requirepass xxx --masterauth xxx GNU gdb (GDB) Red Hat Enterprise Linux 9.2-7.1.al8 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from dragonfly-x86_64... (gdb) bt No stack. (gdb) bt No stack. (gdb) No stack. (gdb) No stack. (gdb) No stack. (gdb) bt No stack. (gdb) bt No stack.

but the slave role logs: I20230529 13:16:09.411885 17313 server_family.cc:2000] Replicating 172.15.128.104:6379 W20230529 13:16:09.438244 17313 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510196 17454 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510195 17345 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510234 17456 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510188 17386 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510263 17377 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510246 17365 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510260 17425 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510224 17475 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510332 17389 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510243 17497 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510306 17374 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510267 17434 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510327 17499 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510241 17328 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510278 17450 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510346 17353 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510305 17439 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510324 17350 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510279 17347 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510267 17380 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510304 17313 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510277 17372 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510584 17468 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510578 17448 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510571 17493 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510586 17407 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510654 17369 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510682 17323 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510689 17441 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510751 17477 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510812 17357 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510841 17383 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510869 17484 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510881 17460 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510890 17341 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510905 17393 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510910 17329 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510919 17419 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510980 17337 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510993 17479 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510952 17398 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510921 17481 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510924 17489 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511044 17404 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511068 17410 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510952 17360 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510962 17487 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510983 17428 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511041 17333 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510933 17431 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510988 17470 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511055 17413 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511085 17362 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510994 17315 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511101 17401 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511040 17320 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510985 17395 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510941 17319 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511087 17462 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511041 17422 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510980 17338 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511003 17466 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.510952 17443 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.511126 17416 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514577 17345 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514583 17456 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514647 17454 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514679 17386 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514803 17475 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514807 17425 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514818 17365 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514829 17377 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514883 17497 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.514979 17313 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515033 17389 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515060 17499 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515094 17328 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515164 17434 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515198 17374 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515205 17439 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515295 17468 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515327 17450 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515400 17448 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515435 17372 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515460 17380 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515506 17323 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515511 17487 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515630 17493 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515672 17353 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515673 17441 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515704 17357 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515741 17350 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515749 17383 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515797 17347 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515830 17329 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515836 17422 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515858 17484 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515893 17477 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515908 17419 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.515992 17341 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516108 17398 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516120 17460 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516132 17393 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516165 17337 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516266 17481 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516266 17479 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516326 17489 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516337 17404 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516366 17407 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516389 17410 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516419 17431 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516482 17428 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516566 17362 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516569 17413 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516700 17315 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516712 17466 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516759 17333 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516796 17320 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516803 17443 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516842 17395 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516845 17360 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.516862 17470 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522485 17462 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522486 17369 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522496 17416 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522505 17338 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522504 17319 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 W20230529 13:16:09.522544 17401 uring_proactor.cc:203] CQE error: 125 cqe.flags=0 I20230529 13:16:09.558283 17313 replica.cc:616] Started full sync with 172.15.128.104:6379 SIGSEGV received at time=1685337370 on cpu 18 PC: @ 0x55ff0b9ce390 (unknown) dfly::detail::Segment<>::Bucket::FindByFp<>()

zhe master role logs: E20230529 13:15:11.631170 4285 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.631166 4517 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631207 4539 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630605 4186 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.631234 4355 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.631234 4531 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.630937 4643 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631263 4640 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631299 4662 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631314 4612 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.630940 4646 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.630980 4425 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630836 4483 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.630833 4266 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630656 4189 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.631182 4597 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.632668 4589 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631203 4368 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.631261 4576 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.630757 4238 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630846 4245 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630849 4475 rdb_save.cc:1158] io error system:103 E20230529 13:15:11.630820 4527 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631203 4547 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631280 4628 rdb_save.cc:1158] io error system:104 E20230529 13:15:11.631165 4593 rdb_save.cc:1158] io error system:104

kissingtiger commented 1 year ago

@romange Discord app, I applied to add friends, romange#0778

chakaz commented 1 year ago

It looks like, in gdb, you need to type in run for the process to start, then wait for it to crash, and then type bt to get the stack

romange commented 1 year ago

thank you @chakaz . I just instructed @kissingtiger to use gdb -batch -ex "run" -ex "bt" --args .... that does the same