Closed andriytk closed 2 years ago
For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-31906. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.
For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-31907. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.
Last m0trace records:
2132746 2865676.886.587786 7ff1d67fb8d0 CALL fop/fom_generic.c:858:m0_fom_tick_generic > fom=0x7ff18c079920 phase=tx_open
2132747 2865676.886.589734 7ff1d67fb840 CALL be/tx.c:273:m0_be_tx_payload_prep > tx=0x7ff18c0799f0 size=262144
2132748 2865676.886.592750 7ff1d67fb7f0 CALL lib/memory.c:130:m0_alloc > size=48
2132749 2865676.886.595670 7ff1d67fb7f0 CALL lib/memory.c:141:m0_alloc < ptr=0x559735a67fd0 size=48
2132750 2865676.886.595816 7ff1d67fb7f0 CALL lib/memory.c:130:m0_alloc > size=24
2132751 2865676.886.597830 7ff1d67fb7f0 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d81f30a0 size=24
2132752 2865676.886.602684 7ff1d67fb820 CALL be/tx.c:263:m0_be_tx_prep > tx=0x7ff18c0799f0 credit=(31022,8301076)
2132753 2865676.886.604154 7ff1d67fb810 CALL be/tx.c:284:m0_be_tx_open > tx=0x7ff18c0799f0 t_prepared=(31022,8301076) t_payload_prepared=262144
2132754 2865676.886.606134 7ff1d67fb760 CALL be/tx.c:489:be_tx_state_move > tx=0x7ff18c0799f0 rc=0 prepare -> opening
2132755 2865676.886.612314 7ff1d67fb700 DEBUG be/engine.c:646:m0_be_engine__tx_state_set tx 0x7ff18c0799f0: => opening
2132756 2865676.886.641030 7ff1d67fb600 CALL be/log.c:854:m0_be_log_reserve > log=(lg_current=156909568 lg_discarded=156827648 lg_free=134119424) size=9314304
2132757 2865676.886.642834 7ff1d67fb600 CALL be/log.c:865:m0_be_log_reserve < rc=0 log=(lg_current=156909568 lg_discarded=156827648 lg_free=124805120)
2132758 2865676.886.646894 7ff1d67fb630 DEBUG be/tx.c:550:m0_be_tx__state_post tx=0x7ff18c0799f0 state=grouping
2132759 2865676.886.648864 7ff1d67fb760 CALL be/tx.c:521:be_tx_state_move <
2132760 2865676.886.649400 7ff1d67fb810 CALL be/tx.c:296:m0_be_tx_open <
2132761 2865676.886.650772 7ff1d67fb880 DEBUG fop/fom.c:1694:m0_fom_phase_set fom=0x7ff18c079920, item 0x7ff18c077bb0[42] phase set: tx_open -> tx_wait
2132762 2865676.886.654332 7ff1d67fb8d0 CALL fop/fom_generic.c:874:m0_fom_tick_generic < fom=0x7ff18c079920 phase=tx_wait rc=2
2132763 2865676.886.654646 7ff1d67fb8d0 CALL fop/fom_generic.c:875:m0_fom_tick_generic < rc=2
2132764 2865676.886.654922 7ff1d67fb940 DEBUG ioservice/io_foms.c:2324:m0_io_fom_cob_rw_tick fom = 0x7ff18c079920, stob = 0x7ff1d80877c0, rc = 0
2132765 2865676.886.655016 7ff1d67fb940 CALL ioservice/io_foms.c:2326:m0_io_fom_cob_rw_tick < rc=2
2132766 2865676.886.656194 7ff1d67fb940 CALL ioservice/io_foms.c:2258:m0_io_fom_cob_rw_tick > fom 0x7ff18c079920, fop 0x7ff18c077b90, item 0x7ff18c077bb0[42] tx_wait<4300000a04b51096:f02400000900000a>
2132767 2865676.886.656580 7ff1d67fb8d0 CALL fop/fom_generic.c:858:m0_fom_tick_generic > fom=0x7ff18c079920 phase=tx_wait
2132768 2865676.886.657166 7ff1d67fb870 CALL fop/fom_generic.c:294:fom_tx_wait > fom=0x7ff18c079920, tx_state 3
2132769 2865676.886.660114 7ff1d67fb870 CALL fop/fom_generic.c:305:fom_tx_wait <
2132770 2865676.886.660388 7ff1d67fb8d0 CALL fop/fom_generic.c:874:m0_fom_tick_generic < fom=0x7ff18c079920 phase=tx_wait rc=1
2132771 2865676.886.660530 7ff1d67fb8d0 CALL fop/fom_generic.c:875:m0_fom_tick_generic < rc=1
2132772 2865676.886.660876 7ff1d67fb940 CALL ioservice/io_foms.c:2301:m0_io_fom_cob_rw_tick < rc=1
2132773 2865676.886.670314 7ff1d67fba10 CALL be/tx.c:427:be_tx_state_move_ast > tx=0x7ff18c0799f0 opening -> grouping
2132774 2865676.886.671314 7ff1d67fb960 CALL be/tx.c:489:be_tx_state_move > tx=0x7ff18c0799f0 rc=0 opening -> grouping
2132775 2865676.886.680802 7ff1d67fb900 DEBUG be/engine.c:646:m0_be_engine__tx_state_set tx 0x7ff18c0799f0: => grouping
2132776 2865676.886.687224 7ff1d67fb7a0 CALL be/tx_group.c:297:m0_be_tx_group_tx_add > gr=0x559735aff570 tx=0x7ff18c0799f0 t_prepared=(31022,8301076) t_payload_prepared=262144 tg_used=(0,0) tg_payload_prepared=0 group_tx_nr=0
2132777 2865676.886.688698 7ff1d67fb760 DEBUG be/tx_group.c:279:be_tx_group_tx_add tx=0x7ff18c0799f0 group=0x559735aff570
2132778 2865676.886.692538 7ff1d67fb7a0 CALL be/tx_group.c:322:m0_be_tx_group_tx_add < rc=0
2132779 2865676.886.694420 7ff1d67fb860 CALL be/engine.c:430:be_engine_group_timeout_arm > en=0x7ffe12ca1f98 gr=0x559735aff570 sm_grp=0x5597354d8ce0
2132780 2865676.886.719360 7ff1d67fb860 CALL be/engine.c:443:be_engine_group_timeout_arm < grouping_q_length=1 delay=1382812
2132781 2865676.886.721078 7ff1d67fb820 DEBUG be/tx.c:550:m0_be_tx__state_post tx=0x7ff18c0799f0 state=active
2132782 2865676.886.721534 7ff1d67fb860 CALL be/engine.c:496:be_engine_tx_trygroup < rc=0
2132783 2865676.886.722276 7ff1d67fb960 CALL be/tx.c:521:be_tx_state_move <
2132784 2865676.886.725698 7ff1d67fba10 CALL be/tx.c:427:be_tx_state_move_ast > tx=0x7ff18c0799f0 grouping -> active
2132785 2865676.886.726212 7ff1d67fb960 CALL be/tx.c:489:be_tx_state_move > tx=0x7ff18c0799f0 rc=0 grouping -> active
2132786 2865676.886.727398 7ff1d67fb910 CALL lib/memory.c:119:m0_alloc_nz > size=262144
2132787 2865676.886.793174 7ff1d67fb910 CALL lib/memory.c:122:m0_alloc_nz < ptr=0x7ff1d8268410 size=262144
2132788 2865676.886.794394 7ff1d67fb840 CALL lib/memory.c:119:m0_alloc_nz > size=2481720
2132789 2865676.886.841146 7ff1d67fb840 CALL lib/memory.c:122:m0_alloc_nz < ptr=0x7ff1d82a8420 size=2481720
2132790 2865676.886.842684 7ff1d67fb8c0 CALL be/tx_regmap.c:392:m0_be_regmap_init < rc=0
2132791 2865676.886.842950 7ff1d67fb8c0 CALL lib/memory.c:119:m0_alloc_nz > size=8301076
2132792 2865676.886.874600 7ff1d67fb8c0 CALL lib/memory.c:122:m0_alloc_nz < ptr=0x7ff1d8506260 size=8301076
2132793 2865676.886.875332 7ff1d67fb910 CALL be/tx_regmap.c:597:m0_be_reg_area_init < rc=0
2132794 2865676.886.875746 7ff1d67fb960 CALL be/tx.c:462:be_tx_memory_allocate < rc=0
2132795 2865676.886.882872 7ff1d67fb900 DEBUG be/engine.c:646:m0_be_engine__tx_state_set tx 0x7ff18c0799f0: => active
2132796 2865676.886.883876 7ff1d67fb960 CALL be/tx.c:521:be_tx_state_move <
2132797 2865676.886.886660 7ff1d67fba10 CALL be/engine.c:360:be_engine_group_timer_arm > en=0x7ffe12ca1f98 gr=0x559735aff570 sm_grp=0x5597354d8ce0
2132798 2865676.886.888262 7ff1d67fb9d0 CALL lib/timer.c:67:m0_timer_fini > 0x559735b02360
2132799 2865676.886.902822 7ff1d67fb9d0 CALL lib/timer.c:72:m0_timer_fini < 0x559735b02360
2132800 2865676.886.917540 7ff1d67fb8f0 CALL lib/user_space/timer.c:198:timer_posix_init < rc=0
2132801 2865676.886.918064 7ff1d67fb8f0 CALL lib/user_space/timer.c:304:timer_hard_init < rc=0
2132802 2865676.886.918432 7ff1d67fb980 CALL lib/timer.c:61:m0_timer_init < 0x559735b02360, rc=0
2132803 2865676.886.924754 7ff1d67fba10 CALL be/engine.c:366:be_engine_group_timer_arm <
2132804 2865676.886.944460 7ff1d67fb940 CALL ioservice/io_foms.c:2258:m0_io_fom_cob_rw_tick > fom 0x7ff18c079920, fop 0x7ff18c077b90, item 0x7ff18c077bb0[42] tx_wait<4300000a04b51096:f02400000900000a>
2132805 2865676.886.945308 7ff1d67fb8d0 CALL fop/fom_generic.c:858:m0_fom_tick_generic > fom=0x7ff18c079920 phase=tx_wait
2132806 2865676.886.946038 7ff1d67fb870 CALL fop/fom_generic.c:294:fom_tx_wait > fom=0x7ff18c079920, tx_state 5
2132807 2865676.886.947242 7ff1d67fb870 CALL fop/fom_generic.c:310:fom_tx_wait <
2132808 2865676.886.948028 7ff1d67fb880 DEBUG fop/fom.c:1694:m0_fom_phase_set fom=0x7ff18c079920, item 0x7ff18c077bb0[42] phase set: tx_wait -> io-prepare
2132809 2865676.886.952738 7ff1d67fb8d0 CALL fop/fom_generic.c:874:m0_fom_tick_generic < fom=0x7ff18c079920 phase=io-prepare rc=2
2132810 2865676.886.953094 7ff1d67fb8d0 CALL fop/fom_generic.c:875:m0_fom_tick_generic < rc=2
2132811 2865676.886.953638 7ff1d67fb8f0 DEBUG fop/fom.c:1694:m0_fom_phase_set fom=0x7ff18c079920, item 0x7ff18c077bb0[42] phase set: io-prepare -> stobio-launch
2132812 2865676.886.956906 7ff1d67fb940 CALL ioservice/io_foms.c:2301:m0_io_fom_cob_rw_tick < rc=2
2132813 2865676.886.957934 7ff1d67fb940 CALL ioservice/io_foms.c:2258:m0_io_fom_cob_rw_tick > fom 0x7ff18c079920, fop 0x7ff18c077b90, item 0x7ff18c077bb0[42] stobio-launch<4300000a04b51096:f02400000900000a>
2132814 2865676.886.960362 7ff1d67fb7b0 CALL ioservice/io_foms.c:1710:io_launch > fom=0x7ff18c079920
2132815 2865676.886.964728 7ff1d67fb6a0 CALL ioservice/io_foms.c:965:m0_io_cob_stob_create > COB:<4300000a04b51096:f02400000900000a>pver:<7600000000000001:b8>
2132816 2865676.886.966750 7ff1d67fb610 CALL cob/cob.c:1417:m0_cob_locate > dom=0x400000131490 oikey=(<4300000a04b51096:f02400000900000a>, 0)
2132817 2865676.886.967378 7ff1d67fb580 CALL lib/memory.c:130:m0_alloc > size=416
2132818 2865676.886.968240 7ff1d67fb580 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d815b0c0 size=416
2132819 2865676.886.969462 7ff1d67fb560 DEBUG file/file.c:485:m0_file_init <0:0>
2132820 2865676.887.005808 7ff1d67fb250 CALL be/btree.c:2338:m0_be_btree_cursor_get_sync < rc=0
2132821 2865676.887.008052 7ff1d67fb290 DEBUG cob/cob.c:1264:cob_oi_lookup found: fid=<4300000a06ce1096:7b2e000008000001> lno=0 pfid=<4700000006ce1096:7b2e000008000001> name='10'
2132822 2865676.887.008586 7ff1d67fb290 DEBUG cob/cob.c:1272:cob_oi_lookup old fid=<4300000a04b51096:f02400000900000a> fid=<4300000a06ce1096:7b2e000008000001>
2132823 2865676.887.009026 7ff1d67fb290 CALL cob/cob.c:1283:cob_oi_lookup < rc=-2
2132824 2865676.887.009594 7ff1d67fb610 DEBUG cob/cob.c:1434:m0_cob_locate cob_oi_lookup() failed with -2
2132825 2865676.887.011542 7ff1d67fb5e0 DEBUG lib/memory.c:151:m0_free 0x7ff1d815b0c0
2132826 2865676.887.012316 7ff1d67fb610 CALL cob/cob.c:1436:m0_cob_locate < rc=-2
2132827 2865676.887.018554 7ff1d67fb650 CALL ioservice/cob_foms.c:828:m0_cc_stob_create > stob create fid=<200000a04b51096:f02400000900000a>
2132828 2865676.887.021230 7ff1d67fb5e0 CALL ioservice/storage_dev.c:817:m0_storage_dev_stob_create > stob_id={<200000000000000:a>,<200000a04b51096:f02400000900000a>}
2132829 2865676.887.035558 7ff1d67fb560 CALL stob/stob.c:89:m0_stob_find_by_key < rc=0
2132830 2865676.887.036726 7ff1d67fb540 CALL stob/stob.c:164:m0_stob_create > stob=0x7ff1d80877c0 so_id={<200000000000000:a>,<200000a04b51096:f02400000900000a>} so_ref=5
2132831 2865676.887.038322 7ff1d67fb2d0 DEBUG stob/ad.c:732:stob_ad_create 200000a04b51096:f02400000900000a
2132832 2865676.887.046062 7ff1d67fb050 DEBUG be/extmap.c:763:m0_be_emap_obj_insert Nob: key = 56 val = 64
2132833 2865676.887.049924 7ff1d67fafc0 CALL be/btree.c:1900:be_btree_insert > tree=0x400000108468
2132834 2865676.887.051234 7ff1d67fabb0 CALL be/btree.c:1380:btree_save > tree=0x400000108468
2132835 2865676.887.051788 7ff1d67fabb0 DEBUG be/btree.c:1390:btree_save DEC cr=0x7ff18c079a98 user=0 balance=24
2132836 2865676.887.079056 7ff1d67faa20 DEBUG be/fl.c:213:m0_be_fl_pick chunk=0x4000001a5310 size=120 bac_size=26842868896 index=128 iterations=1
2132837 2865676.887.133034 7ff1d67faa90 DEBUG be/alloc.c:1068:m0_be_alloc_aligned allocator=0x55973595e708 size=120 shift=3 c=0x4000001a5310 c->bac_size=120 ptr=0x4000001a5360
2132838 2865676.887.138256 7ff1d67fab20 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132839 2865676.887.164508 7ff1d67faad0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132840 2865676.889.210860 7ff1d67fab20 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132841 2865676.889.240734 7ff1d67faad0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132842 2865676.889.252824 7ff1d67fabb0 CALL be/btree.c:1479:btree_save < tree=0x400000108468
2132843 2865676.889.254682 7ff1d67fafc0 CALL be/btree.c:1909:be_btree_insert < tree=0x400000108468
2132844 2865676.889.271688 7ff1d67fb540 CALL stob/stob.c:184:m0_stob_create < rc=0
2132845 2865676.889.274590 7ff1d67fb550 CALL stob/stob.c:299:m0_stob_put > stob=0x7ff1d80877c0 so_id={<200000000000000:a>,<200000a04b51096:f02400000900000a>} so_ref=5
2132846 2865676.889.275256 7ff1d67fb550 DEBUG stob/stob.c:307:m0_stob_put stob 0x7ff1d80877c0, fid=<200000a04b51096:f02400000900000a> so_ref 4, released ref, chan_waiters 0
2132847 2865676.889.276750 7ff1d67fb5e0 CALL ioservice/storage_dev.c:833:m0_storage_dev_stob_create < rc=0
2132848 2865676.889.278148 7ff1d67fb650 CALL ioservice/cob_foms.c:831:m0_cc_stob_create < rc=0
2132849 2865676.889.279684 7ff1d67fb6a0 INFO ioservice/io_foms.c:979:m0_io_cob_stob_create Create on write if cob doesn't exists
2132850 2865676.889.281034 7ff1d67fb560 CALL ioservice/io_foms.c:915:m0_io_cob_create > COB fid:<4300000a04b51096:f02400000900000a>pver:<7600000000000001:b8>
2132851 2865676.889.282426 7ff1d67fb4d0 CALL lib/memory.c:130:m0_alloc > size=416
2132852 2865676.889.287726 7ff1d67fb4d0 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d815b0c0 size=416
2132853 2865676.889.289840 7ff1d67fb4b0 DEBUG file/file.c:485:m0_file_init <0:0>
2132854 2865676.889.308944 7ff1d67fb440 CALL lib/memory.c:130:m0_alloc > size=26
2132855 2865676.889.311152 7ff1d67fb440 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d8266c90 size=26
2132856 2865676.889.316514 7ff1d67fb480 CALL cob/cob.c:1702:m0_cob_create > nskey=(<4700000004b51096:f02400000900000a>, '10') nsrec=(<4300000a04b51096:f02400000900000a>, 0)
2132857 2865676.889.326090 7ff1d67fb0a0 CALL be/btree.c:1900:be_btree_insert > tree=0x4000001314b8
2132858 2865676.889.328726 7ff1d67fac90 CALL be/btree.c:1380:btree_save > tree=0x4000001314b8
2132859 2865676.889.329092 7ff1d67fac90 DEBUG be/btree.c:1390:btree_save DEC cr=0x7ff18c079a98 user=0 balance=23
2132860 2865676.889.346514 7ff1d67fab00 DEBUG be/fl.c:213:m0_be_fl_pick chunk=0x4000001a53d8 size=56 bac_size=26842868696 index=128 iterations=1
2132861 2865676.889.634178 7ff1d67fab70 DEBUG be/alloc.c:1068:m0_be_alloc_aligned allocator=0x55973595e708 size=50 shift=3 c=0x4000001a53d8 c->bac_size=56 ptr=0x4000001a5428
2132862 2865676.889.678732 7ff1d67fac00 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132863 2865676.889.896452 7ff1d67fabb0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132864 2865676.890.026326 7ff1d67fac00 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132865 2865676.890.097606 7ff1d67fabb0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132866 2865676.890.110522 7ff1d67fac90 CALL be/btree.c:1479:btree_save < tree=0x4000001314b8
2132867 2865676.890.111416 7ff1d67fb0a0 CALL be/btree.c:1909:be_btree_insert < tree=0x4000001314b8
2132868 2865676.890.163468 7ff1d67fb0a0 CALL be/btree.c:1900:be_btree_insert > tree=0x4000001315b8
2132869 2865676.890.164676 7ff1d67fac90 CALL be/btree.c:1380:btree_save > tree=0x4000001315b8
2132870 2865676.890.165082 7ff1d67fac90 DEBUG be/btree.c:1390:btree_save DEC cr=0x7ff18c079a98 user=0 balance=22
2132871 2865676.890.255014 7ff1d67fab00 DEBUG be/fl.c:213:m0_be_fl_pick chunk=0x4000001a5460 size=184 bac_size=26842868560 index=128 iterations=1
2132872 2865676.890.356460 7ff1d67fab70 DEBUG be/alloc.c:1068:m0_be_alloc_aligned allocator=0x55973595e708 size=184 shift=3 c=0x4000001a5460 c->bac_size=184 ptr=0x4000001a54b0
2132873 2865676.890.364958 7ff1d67fac00 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132874 2865676.890.391780 7ff1d67fabb0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132875 2865676.890.473704 7ff1d67fac00 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132876 2865676.890.496294 7ff1d67fabb0 CALL format/format.c:146:m0_format_footer_verify_generic < rc=0
2132877 2865676.890.508352 7ff1d67fac90 CALL be/btree.c:1479:btree_save < tree=0x4000001315b8
2132878 2865676.890.508732 7ff1d67fb0a0 CALL be/btree.c:1909:be_btree_insert < tree=0x4000001315b8
2132879 2865676.890.651216 7ff1d67fb480 CALL cob/cob.c:1786:m0_cob_create < rc=0
2132880 2865676.890.654646 7ff1d67fb510 DEBUG lib/memory.c:151:m0_free 0x7ff1d8266c90
2132881 2865676.890.656730 7ff1d67fb530 DEBUG lib/memory.c:151:m0_free 0x7ff1d815b0c0
2132882 2865676.890.658008 7ff1d67fb560 CALL ioservice/io_foms.c:938:m0_io_cob_create < rc=0
2132883 2865676.890.660786 7ff1d67fb610 CALL cob/cob.c:1417:m0_cob_locate > dom=0x400000131490 oikey=(<4300000a04b51096:f02400000900000a>, 0)
2132884 2865676.890.661334 7ff1d67fb580 CALL lib/memory.c:130:m0_alloc > size=416
2132885 2865676.890.662332 7ff1d67fb580 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d815b0c0 size=416
2132886 2865676.890.663352 7ff1d67fb560 DEBUG file/file.c:485:m0_file_init <0:0>
2132887 2865676.890.680364 7ff1d67fb250 CALL be/btree.c:2338:m0_be_btree_cursor_get_sync < rc=0
2132888 2865676.890.683018 7ff1d67fb290 DEBUG cob/cob.c:1264:cob_oi_lookup found: fid=<4300000a04b51096:f02400000900000a> lno=0 pfid=<4700000004b51096:f02400000900000a> name='10'
2132889 2865676.890.683678 7ff1d67fb1f0 CALL lib/memory.c:130:m0_alloc > size=26
2132890 2865676.890.683974 7ff1d67fb1f0 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d8266c90 size=26
2132891 2865676.890.684760 7ff1d67fb290 CALL cob/cob.c:1283:cob_oi_lookup < rc=0
2132892 2865676.890.688306 7ff1d67fb050 CALL be/btree.c:1998:be_btree_lookup > tree=0x4000001315b8 key_in=0x7ff1d67fb620 key_out=(nil) value=0x7ff1d67fb630
2132893 2865676.890.695882 7ff1d67fb050 CALL be/btree.c:2030:be_btree_lookup < rc=0
2132894 2865676.890.698640 7ff1d67fb570 CALL cob/cob.c:1368:cob_get_fabomg < rc=0
2132895 2865676.890.698992 7ff1d67fb610 CALL cob/cob.c:1454:m0_cob_locate < rc=0
2132896 2865676.890.699842 7ff1d67fb6a0 CALL ioservice/io_foms.c:992:m0_io_cob_stob_create < rc=0
2132897 2865676.890.701224 7ff1d67fb7b0 CALL ioservice/io_foms.c:1016:io_fom_cob2file < rc=0
2132898 2865676.890.764816 7ff1d67fb6f0 CALL lib/memory.c:130:m0_alloc > size=2048
2132899 2865676.890.769062 7ff1d67fb6f0 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d81851c0 size=2048
2132900 2865676.890.769424 7ff1d67fb6f0 CALL lib/memory.c:130:m0_alloc > size=2048
2132901 2865676.890.770734 7ff1d67fb6f0 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d815a180 size=2048
2132902 2865676.890.782858 7ff1d67fb7b0 DEBUG ioservice/io_foms.c:1816:io_launch launch fom: 0x7ff18c079920, start_time 1654014028237796989, req_count: 0, count: 0, submitted: 100, expect: 100
2132903 2865676.890.784764 7ff1d67fb700 CALL lib/memory.c:130:m0_alloc > size=424
2132904 2865676.890.787134 7ff1d67fb700 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d82192a0 size=424
2132905 2865676.890.788224 7ff1d67fb750 CALL stob/ad.c:1039:stob_ad_io_init < rc=0
2132906 2865676.890.795026 7ff1d67fb6a0 CALL stob/io.c:164:m0_stob_io_prepare > stob=0x7ff1d80877c0 so_id={<200000000000000:a>,<200000a04b51096:f02400000900000a>} si_opcode=2 io=0x7ff1d81c80c8 tx=0x7ff18c0799e8
2132907 2865676.890.796486 7ff1d67fa760 CALL stob/ad.c:2021:stob_ad_io_launch_prepare > op=2, stob 0x7ff1d80877c0, stob_id={<200000000000000:a>,<200000a04b51096:f02400000900000a>}
2132908 2865676.890.800550 7ff1d67fa690 DEBUG stob/ad.c:1111:stob_ad_cursor <200000a04b51096:f02400000900000a>
2132909 2865676.890.869388 7ff1d67fa510 CALL lib/memory.c:130:m0_alloc > size=64
2132910 2865676.890.870284 7ff1d67fa510 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff18c01a1b0 size=64
2132911 2865676.890.874160 7ff1d67fa600 CALL be/extmap.c:1149:be_emap_lookup < rc=0
2132912 2865676.890.876684 7ff1d67fa690 CALL stob/ad.c:1117:stob_ad_cursor < rc=0
2132913 2865676.890.878202 7ff1d67fa700 CALL stob/ad.c:1229:stob_ad_cursors_init < rc=0
2132914 2865676.890.881774 7ff1d67fa760 CALL stob/ad.c:1924:stob_ad_write_prepare > op=2 sz=256
2132915 2865676.890.884364 7ff1d67fa760 DEBUG stob/ad.c:1072:stob_ad_balloc count=256
2132916 2865676.890.887830 7ff1d67fa3d0 CALL balloc/balloc.c:2877:balloc_alloc > bal=0x400000103e38 goal=0x0 count=256
2132917 2865676.890.888156 7ff1d67fa3d0 CALL balloc/balloc.c:2607:balloc_allocate_internal >
2132918 2865676.890.890090 7ff1d67fa380 CALL balloc/balloc.c:1077:balloc_got_freespace >
2132919 2865676.890.890384 7ff1d67fa380 DEBUG balloc/balloc.c:1081:balloc_got_freespace bsb_freeblocks=6385152 blocks=256
2132920 2865676.890.890700 7ff1d67fa380 CALL balloc/balloc.c:1090:balloc_got_freespace <
2132921 2865676.890.890996 7ff1d67fa380 CALL balloc/balloc.c:1091:balloc_got_freespace < rc=1
2132922 2865676.890.892424 7ff1d67fa3d0 CALL balloc/balloc.c:1012:balloc_init_ac >
2132923 2865676.890.893794 7ff1d67fa3d0 CALL balloc/balloc.c:1040:balloc_init_ac < rc=0
2132924 2865676.890.894250 7ff1d67fa3d0 CALL balloc/balloc.c:1110:balloc_normalize_request >
2132925 2865676.890.894570 7ff1d67fa3d0 CALL balloc/balloc.c:1173:balloc_normalize_request <
2132926 2865676.890.896316 7ff1d67fa3d0 CALL balloc/balloc.c:2411:balloc_regular_allocator > goal=0x2af00 len=256
2132927 2865676.890.897722 7ff1d67fa3d0 DEBUG balloc/balloc.c:2439:balloc_regular_allocator cr=0
2132928 2865676.890.967040 7ff1d67fa070 CALL balloc/balloc.c:1290:m0_balloc_load_extents > grp=2 non-spare-frags=2 spare-frags=0
2132929 2865676.890.967436 7ff1d67fa020 CALL lib/memory.c:130:m0_alloc > size=216
2132930 2865676.890.968106 7ff1d67fa020 CALL lib/memory.c:141:m0_alloc < ptr=0x7ff1d8087690 size=216
2132931 2865676.890.986634 7ff1d67f9d60 CALL be/btree.c:1998:be_btree_lookup > tree=0x400000103f78 key_in=0x7ff1d67fa150 key_out=0x7ff1d67fa150 value=0x7ff1d67fa160
2132932 2865676.891.066758 7ff1d67f9d60 CALL be/btree.c:2030:be_btree_lookup < rc=0
2132933 2865676.891.070996 7ff1d67f9d60 CALL be/btree.c:1998:be_btree_lookup > tree=0x400000103f78 key_in=0x7ff1d67fa150 key_out=0x7ff1d67fa150 value=0x7ff1d67fa160
2132934 2865676.891.072528 7ff1d67f9d60 CALL be/btree.c:2030:be_btree_lookup < rc=0
2132935 2865676.891.074092 7ff1d67fa070 ERROR balloc/balloc.c:1346:m0_balloc_load_extents Invalid extent
2132936 2865676.891.349106 7ff1d67f9f40 FATAL lib/assert.c:50:m0_panic panic: (0) at m0_balloc_load_extents() (balloc/balloc.c:1347) [git: 2.0.0-790-9-g662e7a18] /etc/cortx/log/motr/def3fa765c954db3812b45500924c243/trace/m0d-0x7200000000000001:0x69/m0trace.35.2022-05-31-16:09:28
from gdb:
(gdb) bt
#0 0x00007ff1e032b38f in raise () from /lib64/libc.so.6
#1 0x00007ff1e0315dc5 in abort () from /lib64/libc.so.6
#2 0x00007ff1e241d383 in m0_arch_panic (c=c@entry=0x7ff1e281f420 <__pctx.12199>, ap=ap@entry=0x7ff1d67f9ff0)
at lib/user_space/uassert.c:131
#3 0x00007ff1e240be2d in m0_panic (ctx=ctx@entry=0x7ff1e281f420 <__pctx.12199>) at lib/assert.c:52
#4 0x00007ff1e23344ba in m0_balloc_load_extents (cb=<optimized out>, grp=grp@entry=0x559735b07310) at balloc/balloc.c:1347
#5 0x00007ff1e233527f in balloc_regular_allocator (bac=0x7ff1d67fa690) at balloc/balloc.c:2471
#6 balloc_allocate_internal (req=0x7ff1d67fa630, tx=<optimized out>, ctx=0x400000103e38) at balloc/balloc.c:2626
#7 balloc_alloc (ballroom=0x400000103f60, tx=<optimized out>, count=<optimized out>, out=0x7ff1d67faa40,
alloc_zone=<optimized out>) at balloc/balloc.c:2891
#8 0x00007ff1e24e078e in stob_ad_balloc (adom=0x4000001073d0, adom=0x4000001073d0, alloc_type=<optimized out>,
out=0x7ff1d67faa40, count=256, tx=<optimized out>) at stob/ad.c:1074
#9 stob_ad_write_prepare (map=0x7ff1d67fa8c0, src=0x7ff1d67fa8f0, adom=0x4000001073d0, io=0x7ff1d81c80c8) at stob/ad.c:1935
#10 stob_ad_io_launch_prepare (io=0x7ff1d81c80c8) at stob/ad.c:2039
#11 0x00007ff1e24e3a7f in m0_stob_io_prepare (io=io@entry=0x7ff1d81c80c8, obj=obj@entry=0x7ff1d80877c0,
tx=tx@entry=0x7ff18c0799e8, scope=scope@entry=0x0) at stob/io.c:178
#12 0x00007ff1e24e3f25 in m0_stob_io_prepare_and_launch (io=io@entry=0x7ff1d81c80c8, obj=0x7ff1d80877c0,
tx=tx@entry=0x7ff18c0799e8, scope=scope@entry=0x0) at stob/io.c:226
#13 0x00007ff1e23f6209 in io_launch (fom=0x7ff18c079920) at ioservice/io_foms.c:1837
#14 0x00007ff1e23f38f3 in m0_io_fom_cob_rw_tick (fom=0x7ff18c079920) at ioservice/io_foms.c:2333
#15 0x00007ff1e23dc814 in fom_exec (fom=0x7ff18c079920) at fop/fom.c:791
#16 loc_handler_thread (th=0x5597354d5b20) at fop/fom.c:931
#17 0x00007ff1e24129be in m0_thread_trampoline (arg=arg@entry=0x5597354d5b28) at lib/thread.c:117
#18 0x00007ff1e241e061 in uthread_trampoline (arg=0x5597354d5b28) at lib/user_space/uthread.c:98
#19 0x00007ff1e1aeb15a in start_thread () from /lib64/libpthread.so.0
#20 0x00007ff1e03f0dd3 in clone () from /lib64/libc.so.6
(gdb) f 4
#4 0x00007ff1e23344ba in m0_balloc_load_extents (cb=<optimized out>, grp=grp@entry=0x559735b07310) at balloc/balloc.c:1347
1347 M0_ASSERT(false);
(gdb) l
1342 ++spare_frags;
1343 zone_params_update(grp, &ex->le_ext,
1344 M0_BALLOC_SPARE_ZONE);
1345 } else {
1346 M0_LOG(M0_ERROR, "Invalid extent");
1347 M0_ASSERT(false);
1348 }
1349 next_key = ex->le_ext.e_end + 1;
1350 balloc_debug_dump_extent("loading...", &ex->le_ext);
1351 }
(gdb) p /x *ex
$3 = {le_is_alloc = 0x0, le_link = {ll_next = 0x0, ll_prev = 0x0}, le_ext = {e_header = {hd_magic = 0x33011ca5e511de77,
hd_bits = 0x1001100000020}, e_start = 0x30c00, e_end = 0x31000, e_footer = {ft_magic = 0x33f007e7f007e777,
ft_checksum = 0xc8ac140dfdbf97a3}}}
(gdb) p /x normal_range
$5 = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0x20000, e_end = 0x30000, e_footer = {
ft_magic = 0x33f007e7f007e777, ft_checksum = 0x701361c50db696ac}}
(gdb) p /x spare_range
$6 = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0x30000, e_end = 0x30000, e_footer = {
ft_magic = 0x33f007e7f007e777, ft_checksum = 0xf97fcd9fbbeb16af}}
Looks like there is no more free extents in the balloc group?
Mukund Bala Kanekar commented in Jira Server:
Found series of crash dumps on ssc-vm-g4-rhev4-1491 in timeframe May 31 09:16 to May 31 10:20 (MDT time zone)
{noformat}
[root@ssc-vm-g4-rhev4-1491 ~]# ls -lrt /mnt/fs-local-volume/local-path-provisioner/pvc-c9282ee5-c9ff-4469-8d2f-f959c38559a5_default_cortx-data-fs-local-pvc-ssc-vm-g4-rhev4-1491/motr/m0d-0x7200000000000001:0x69
total 6886792
drwx------ 3 root root 4096 May 30 05:02 db
drwx------ 3 root root 4096 May 30 05:02 db-log
drwx------ 3 root root 4096 May 30 05:02 stobs
-rw------- 1 root root 1014378496 May 30 10:10 core.1653927029.35
-rw------- 1 root root 952832000 May 30 10:24 core.1653927860.35
-rw------- 1 root root 925487104 May 30 10:24 core.1653927873.35
-rw------- 1 root root 925384704 May 30 10:29 core.1653928191.35
-rw------- 1 root root 925483008 May 30 10:30 core.1653928246.35
-rw------- 1 root root 936599552 May 30 10:32 core.1653928353.35
-rw------- 1 root root 943222784 May 30 10:58 core.1653929926.35
-rw------- 1 root root 925409280 May 30 10:59 core.1653929967.35
-rw------- 1 root root 947154944 May 30 10:59 core.1653929993.34
-rw------- 1 root root 926494720 May 30 11:00 core.1653930038.35
-rw------- 1 root root 937992192 May 31 09:16 core.1654010193.35
-rw------- 1 root root 936472576 May 31 09:18 core.1654010284.35
-rw------- 1 root root 936955904 May 31 09:18 core.1654010334.35
-rw------- 1 root root 933593088 May 31 09:47 core.1654012026.35
-rw------- 1 root root 924852224 May 31 09:47 core.1654012037.35
-rw------- 1 root root 926212096 May 31 09:49 core.1654012156.35
-rw------- 1 root root 936136704 May 31 09:50 core.1654012203.35
-rw------- 1 root root 925474816 May 31 09:54 core.1654012449.34
-rw------- 1 root root 937316352 May 31 09:58 core.1654012684.35
-rw------- 1 root root 926699520 May 31 10:04 core.1654013065.35
-rw------- 1 root root 942231552 May 31 10:20 core.1654014028.35
[root@ssc-vm-g4-rhev4-1491 ~]# date
Thu Jun 9 02:31:09 MDT 2022
[root@ssc-vm-g4-rhev4-1491 ~]#{noformat}
Back trace for core.1654010193.35 (first) and core.1654014028.35(last) core dump on May 31 is same i.e assert is from "m0_balloc_load_extents (cb=
Further analysis from core.1654010193.35, The extent under process at the time of crash has e_start = 0x30c00, e_end = 0x31000 but the group 2 under process has the normal range e_start = 0x20000, e_end = 0x30000 and 0 blocks in spare range e_start = 0x30000, e_end = 0x30000.
{noformat} (gdb) f 4
1347 M0_ASSERT(false); (gdb) l 1342 ++spare_frags; 1343 zone_params_update(grp, &ex->le_ext, 1344 M0_BALLOC_SPARE_ZONE); 1345 } else { 1346 M0_LOG(M0_ERROR, "Invalid extent"); 1347 M0_ASSERT(false); 1348 } 1349 next_key = ex->le_ext.e_end + 1; 1350 balloc_debug_dump_extent("loading...", &ex->le_ext); 1351 } (gdb) p/x ex $28 = {le_is_alloc = 0x0, le_link = {ll_next = 0x0, ll_prev = 0x0}, le_ext = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0x30c00, e_end = 0x31000, e_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xc8ac140dfdbf97a3}}} (gdb) p/x grp $29 = {bgi_state = 0x1, bgi_groupno = 0x2, bgi_normal = {bzp_type = 0x2000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0x20000, e_end = 0x30000, e_footer = {ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x200, bzp_fragments = 0x2, bzp_maxchunk = 0x200, bzp_extents = {l_head = 0x7f5c8c09f838, l_tail = 0x7f5c8c09f838}}, bgi_spare = {bzp_type = 0x1000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0x30000, e_end = 0x30000, e_footer = { ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x0, bzp_fragments = 0x0, bzp_maxchunk = 0x0, bzp_extents = {l_head = 0x55a329499410, l_tail = 0x55a329499410}}, bgi_extents = 0x7f5c8c09f830, bgi_mutex = {bm_u = {mutex = {m_arch = {m_impl = {data = {lock = 0x1, count = 0x0, owner = 0x4a, nusers = 0x1, kind = 0x0, spins = 0x0, elision = 0x0, list = {prev = 0x0, next = 0x0}}, size = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4a, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 27 times>}, __align = 0x1}}, m_owner = 0x55a328fe7938, m_addb2 = 0x0}, pad = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4a, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 27 times>, 0x38, 0x79, 0xfe, 0x28, 0xa3, 0x55, 0x0 <repeats 122 times>}}}} (gdb){noformat} The extent under process looks to be part of next group (i.e group 3 ) which has normal range e_start = 0x30000, e_end = 0x40000 {noformat} (gdb) p/x (grp+1) $30 = {bgi_state = 0x1, bgi_groupno = 0x3, bgi_normal = {bzp_type = 0x2000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0x30000, e_end = 0x40000, e_footer = {ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0xd400, bzp_fragments = 0x2, bzp_maxchunk = 0xd000, bzp_extents = {l_head = 0x55a329499530, l_tail = 0x55a329499530}}, bgi_spare = {bzp_type = 0x1000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0x40000, e_end = 0x40000, e_footer = { ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x0, bzp_fragments = 0x0, bzp_maxchunk = 0x0, bzp_extents = {l_head = 0x55a329499590, l_tail = 0x55a329499590}}, bgi_extents = 0x0, bgi_mutex = {bm_u = {mutex = {m_arch = {m_impl = {data = {lock = 0x0, count = 0x0, owner = 0x0, nusers = 0x0, kind = 0x0, spins = 0x0, elision = 0x0, list = {prev = 0x0, next = 0x0}}, size = {0x0 <repeats 40 times>}, __align = 0x0}}, m_owner = 0x0, m_addb2 = 0x0}, pad = {0x0 <repeats 168 times>}}}} (gdb){noformat} According to group info for group 2 , bzp_fragments = 0x2 for normal range, 0th extend has been already added to normal range free extent list of group 2 , and has valid range belonging to normal range of group 2 ( {noformat} gdb) p/x grp->bgi_normal.bzp_extents $18 = {l_head = 0x7f5c8c09f838, l_tail = 0x7f5c8c09f838} (gdb) p/x &grp->bgi_normal.bzp_extents $19 = 0x55a3294993b0 (gdb) p sizeof(struct m0_list_link) $20 = 16 (gdb) p/x (struct m0_ext*)0x7f5c8c09f848 $21 = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0x2ae00, e_end = 0x2b000, e_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xe4460464fcf98771}} (gdb){noformat} and the crash has occured while processing 2nd extent at index =1 {noformat} (gdb) p i $31 = 1{noformat} There are 2 possibilities,
Mukund Bala Kanekar commented in Jira Server:
From core dump details of group_extents and group_desc btree's are as follows {noformat} (gdb) p db_ext $1 = (struct m0_be_btree ) 0x400000103f78 (gdb) p/x ((struct m0_balloc )((char )(db_ext)-(char )(&((struct m0_balloc )0)->cb_db_group_extents))) $2 = 0x400000103e38 (gdb) p/x &((struct m0_balloc )0x400000103e38)->cb_db_group_extents $3 = 0x400000103f78 (gdb) p/x ((struct m0_balloc )0x400000103e38)->cb_db_group_extents $4 = {bb_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1000300000050}, bb_cookie_gen = 0x16f3dd167a5ffc27, bb_backlink = {bli_tree = {co_addr = 0x400000103f88, co_generation = 0x16f3dd167a5ffc27}, bli_type = 0x2, bli_gen = 0x16f3dd16f8c98fd3, bli_fid = {f_container = 0x6200000000000002, f_key = 0xa}}, bb_root = 0x400000104280, bb_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xe4844cbdb08c9889}, bb_lock = {bl_u = {rwlock = {rw_lock = {data = { readers = 0x0, writers = 0x0, wrphase_futex = 0x0, writers_futex = 0x0, pad3 = 0x0, pad4 = 0x0, __cur_writer = 0x0, shared = 0x0, rwelision = 0x0, pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, pad2 = 0x0, flags = 0x0}, size = {0x0 <repeats 56 times>}, align = 0x0}}, pad = { 0x0 <repeats 144 times>}}}, bb_seg = 0x55a32943a010, bb_ops = 0x7f5cbea86880} (gdb) p/x ((struct m0_balloc )0x400000103e38)->cb_db_group_desc $5 = {bb_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1000300000050}, bb_cookie_gen = 0x16f3dd167a5ffc28, bb_backlink = {bli_tree = {co_addr = 0x400000104088, co_generation = 0x16f3dd167a5ffc28}, bli_type = 0x3, bli_gen = 0x16f3dd16f8c98fd3, bli_fid = {f_container = 0x6200000000000003, f_key = 0xa}}, bb_root = 0x400000105b28, bb_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0x6bffc6d17c8d7acf}, bb_lock = {bl_u = {rwlock = {rw_lock = {data = { readers = 0x0, writers = 0x0, wrphase_futex = 0x0, writers_futex = 0x0, pad3 = 0x0, pad4 = 0x0, __cur_writer = 0x0, shared = 0x0, rwelision = 0x0, pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, pad2 = 0x0, flags = 0x0}, size = {0x0 <repeats 56 times>}, align = 0x0}}, pad = { 0x0 <repeats 144 times>}}}, bb_seg = 0x55a32943a010, bb_ops = 0x7f5cbea86860} (gdb){noformat} Next step is get details of on disk data of group 2 in 'cb_db_group_desc' btree using m0beck to identify any discrepancies.
Mukund Bala Kanekar commented in Jira Server:
Further debug shows that the root node in Group Descriptor Btree holds the group 2 Descriptor at kv array index 2. {noformat} (gdb) p/x ((struct m0_balloc )0x400000103e38)->cb_db_group_desc $1 = {bb_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1000300000050}, bb_cookie_gen = 0x16f3dd167a5ffc28, bb_backlink = {bli_tree = {co_addr = 0x400000104088, co_generation = 0x16f3dd167a5ffc28}, bli_type = 0x3, bli_gen = 0x16f3dd16f8c98fd3, bli_fid = {f_container = 0x6200000000000003, f_key = 0xa}}, bb_root = 0x400000105b28, bb_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0x6bffc6d17c8d7acf}, bb_lock = {bl_u = {rwlock = {rw_lock = {data = { readers = 0x0, writers = 0x0, wrphase_futex = 0x0, writers_futex = 0x0, pad3 = 0x0, pad4 = 0x0, __cur_writer = 0x0, shared = 0x0, rwelision = 0x0, pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, pad2 = 0x0, flags = 0x0}, size = {0x0 <repeats 56 times>}, align = 0x0}}, pad = { 0x0 <repeats 144 times>}}}, bb_seg = 0x55a32943a010, bb_ops = 0x7f5cbea86860} (gdb) p/x ((struct m0_balloc )0x400000103e38)->cb_db_group_desc.bb_root $2 = 0x400000105b28 (gdb) p/x ((struct m0_be_bnode)0x400000105b28) $3 = {bt_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1000400001848}, bt_backlink = {bli_tree = {co_addr = 0x400000104088, co_generation = 0x16f3dd167a5ffc28}, bli_type = 0x3, bli_gen = 0x16f3dd16f8c98fd3, bli_fid = {f_container = 0x6200000000000003, f_key = 0xa}}, bt_next = 0x0, bt_num_active_key = 0x64, bt_level = 0x0, bt_isleaf = 0x1, bt_pad = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, bt_kv_arr = {{btree_key = 0x40000010a010, btree_val = 0x40000010a018}, {btree_key = 0x40000010a2f8, btree_val = 0x40000010a300}, {btree_key = 0x40000010a108, btree_val = 0x40000010a110}, {btree_key = 0x40000010a9c0, btree_val = 0x40000010a9c8}, { btree_key = 0x40000010b560, btree_val = 0x40000010b568}, {btree_key = 0x40000010bc88, btree_val = 0x40000010bc90}, {btree_key = 0x40000010ce90, btree_val = 0x40000010ce98}, {btree_key = 0x40000010a200, btree_val = 0x40000010a208}, {btree_key = 0x40000010a3f0, btree_val = 0x40000010a3f8}, { btree_key = 0x40000010d270, btree_val = 0x40000010d278}, {btree_key = 0x40000010d460, btree_val = 0x40000010d468}, {btree_key = 0x40000010a4e8, btree_val = 0x40000010a4f0}, {btree_key = 0x40000010b3d0, btree_val = 0x40000010b3d8}, {btree_key = 0x40000010b088, btree_val = 0x40000010b090}, { btree_key = 0x40000010d558, btree_val = 0x40000010d560}, {btree_key = 0x40000010a5e0, btree_val = 0x40000010a5e8}, {btree_key = 0x40000010bd20, btree_val = 0x40000010bd28}, {btree_key = 0x40000010b940, btree_val = 0x40000010b948}, {btree_key = 0x40000010bb30, btree_val = 0x40000010bb38}, { btree_key = 0x40000010b848, btree_val = 0x40000010b850}, {btree_key = 0x40000010aca8, btree_val = 0x40000010acb0}, {btree_key = 0x40000010c7c8, btree_val = 0x40000010c7d0}, {btree_key = 0x40000010b468, btree_val = 0x40000010b470}, {btree_key = 0x40000010cf88, btree_val = 0x40000010cf90}, { btree_key = 0x40000010d178, btree_val = 0x40000010d180}, {btree_key = 0x40000010ae98, btree_val = 0x40000010aea0}, {btree_key = 0x40000010b278, btree_val = 0x40000010b280}, {btree_key = 0x40000010a7d0, btree_val = 0x40000010a7d8}, {btree_key = 0x40000010a8c8, btree_val = 0x40000010a8d0}, { btree_key = 0x40000010b658, btree_val = 0x40000010b660}, {btree_key = 0x40000010c5d8, btree_val = 0x40000010c5e0}, {btree_key = 0x40000010dc20, btree_val = 0x40000010dc28}, {btree_key = 0x40000010dd18, btree_val = 0x40000010dd20}, {btree_key = 0x40000010aab8, btree_val = 0x40000010aac0}, { btree_key = 0x40000010c2f0, btree_val = 0x40000010c2f8}, {btree_key = 0x40000010c3e8, btree_val = 0x40000010c3f0}, {btree_key = 0x40000010af90, btree_val = 0x40000010af98}, {btree_key = 0x40000010a6d8, btree_val = 0x40000010a6e0}, {btree_key = 0x40000010b180, btree_val = 0x40000010b188}, { btree_key = 0x40000010b750, btree_val = 0x40000010b758}, {btree_key = 0x40000010ba38, btree_val = 0x40000010ba40}, {btree_key = 0x40000010abb0, btree_val = 0x40000010abb8}, {btree_key = 0x40000010c6d0, btree_val = 0x40000010c6d8}, {btree_key = 0x40000010c8c0, btree_val = 0x40000010c8c8}, { btree_key = 0x40000010c9b8, btree_val = 0x40000010c9c0}, {btree_key = 0x40000010cab0, btree_val = 0x40000010cab8}, {btree_key = 0x40000010ada0, btree_val = 0x40000010ada8}, {btree_key = 0x40000010c4e0, btree_val = 0x40000010c4e8}, {btree_key = 0x40000010c1f8, btree_val = 0x40000010c200}, { btree_key = 0x40000010c100, btree_val = 0x40000010c108}, {btree_key = 0x40000010c008, btree_val = 0x40000010c010}, {btree_key = 0x40000010bf10, btree_val = 0x40000010bf18}, {btree_key = 0x40000010be18, btree_val = 0x40000010be20}, {btree_key = 0x40000010d080, btree_val = 0x40000010d088}, { btree_key = 0x40000010cba8, btree_val = 0x40000010cbb0}, {btree_key = 0x40000010cca0, btree_val = 0x40000010cca8}, {btree_key = 0x40000010cd98, btree_val = 0x40000010cda0}, {btree_key = 0x40000010d3c8, btree_val = 0x40000010d3d0}, {btree_key = 0x40000010d650, btree_val = 0x40000010d658}, { btree_key = 0x40000010d748, btree_val = 0x40000010d750}, {btree_key = 0x40000010d840, btree_val = 0x40000010d848}, {btree_key = 0x40000010d938, btree_val = 0x40000010d940}, {btree_key = 0x40000010da30, btree_val = 0x40000010da38}, {btree_key = 0x40000010db28, btree_val = 0x40000010db30}, { btree_key = 0x40000010eee8, btree_val = 0x40000010eef0}, {btree_key = 0x40000010f078, btree_val = 0x40000010f080}, {btree_key = 0x40000010edf0, btree_val = 0x40000010edf8}, {btree_key = 0x40000010f930, btree_val = 0x40000010f938}, {btree_key = 0x40000010eaa8, btree_val = 0x40000010eab0}, { btree_key = 0x40000010df08, btree_val = 0x40000010df10}, {btree_key = 0x40000010f838, btree_val = 0x40000010f840}, {btree_key = 0x40000010e820, btree_val = 0x40000010e828}, {btree_key = 0x40000010fff8, btree_val = 0x400000110000}, {btree_key = 0x40000010ff00, btree_val = 0x40000010ff08}, { btree_key = 0x40000010f170, btree_val = 0x40000010f178}, {btree_key = 0x40000010fe08, btree_val = 0x40000010fe10}, {btree_key = 0x40000010e9b0, btree_val = 0x40000010e9b8}, {btree_key = 0x40000010f740, btree_val = 0x40000010f748}, {btree_key = 0x40000010eba0, btree_val = 0x40000010eba8}, { btree_key = 0x40000010f648, btree_val = 0x40000010f650}, {btree_key = 0x40000010fd10, btree_val = 0x40000010fd18}, {btree_key = 0x40000010e538, btree_val = 0x40000010e540}, {btree_key = 0x40000010e8b8, btree_val = 0x40000010e8c0}, {btree_key = 0x40000010f550, btree_val = 0x40000010f558}, { btree_key = 0x40000010efe0, btree_val = 0x40000010efe8}, {btree_key = 0x40000010f458, btree_val = 0x40000010f460}, {btree_key = 0x40000010fc18, btree_val = 0x40000010fc20}, {btree_key = 0x40000010e6c8, btree_val = 0x40000010e6d0}, {btree_key = 0x40000010e250, btree_val = 0x40000010e258}, { btree_key = 0x40000010f360, btree_val = 0x40000010f368}, {btree_key = 0x40000010fb20, btree_val = 0x40000010fb28}, {btree_key = 0x40000010e630, btree_val = 0x40000010e638}, {btree_key = 0x40000010ecf8, btree_val = 0x40000010ed00}, {btree_key = 0x40000010e440, btree_val = 0x40000010e448}, { btree_key = 0x40000010f268, btree_val = 0x40000010f270}, {btree_key = 0x40000010fa28, btree_val = 0x40000010fa30}, {btree_key = 0x40000010e158, btree_val = 0x40000010e160}, {btree_key = 0x40000010e348, btree_val = 0x40000010e350}, {btree_key = 0x40000010e060, btree_val = 0x40000010e068}, { btree_key = 0x40000010de10, btree_val = 0x40000010de18}, {btree_key = 0x0, btree_val = 0x0} <repeats 155 times>}, bt_child_arr = {0x0 <repeats 256 times>}, bt_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xffb7b81c66e7d45a}} (gdb) p/x ((struct m0_be_bnode)0x400000105b28)->bt_kv_arr[2].btree_key $4 = 0x40000010a108 (gdb) p/x ((m0_bindex_t )0x40000010a108) $5 = 0x2 (gdb) p/x ((struct m0_be_bnode)0x400000105b28)->bt_kv_arr[2].btree_val $6 = 0x40000010a110 (gdb) p/x ((struct m0_balloc_group_desc)0x40000010a110) $7 = {bgd_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001000000030}, bgd_groupno = 0x2, bgd_freeblocks = 0x300, bgd_fragments = 0x2, bgd_maxchunk = 0x200, bgd_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xddc9911d574def59}} (gdb){noformat} From the group 2 Descriptor in btree, the total free blocks are 0x300, and max chunk is of length 0x200. Referring to my previous comments, Extent Btree has only one entry belonging to group 2 with length 0x200, which matches with max chunk length as well. It appears that the second extent entry with length 0x100 is missing for group 2.
Hua Huang commented in Jira Server:
Please use this patch to dump the ex before checking it:
diff --git a/balloc/balloc.c b/balloc/balloc.c
index fe5bdd14f..e583a7f60 100644
--- a/balloc/balloc.c
+++ b/balloc/balloc.c
@@ -1332,6 +1332,7 @@ M0_INTERNAL int m0_balloc_load_extents(struct m0_balloc *cb,
if (rc != 0)
break;
m0_ext_init(&ex->le_ext);
+ balloc_debug_dump_extent("loading...", &ex->le_ext);
if (m0_ext_is_partof(&normal_range, &ex->le_ext)) {
m0_list_add_tail(group_normal_ext(grp), &ex->le_link);
++normal_frags;
@@ -1347,7 +1348,6 @@ M0_INTERNAL int m0_balloc_load_extents(struct m0_balloc *cb,
M0_ASSERT(false);
}
next_key = ex->le_ext.e_end + 1;
- balloc_debug_dump_extent("loading...", &ex->le_ext);
}
if (i != group_fragments_get(grp) + group_spare_fragments_get(grp))
So we can see what the 'ex' is.
Mukund Bala Kanekar commented in Jira Server:
[~520428] so far we are doing static analysis with available core dump.
Also for group 2 we are able to extract extent details from dump , please refer my previous comments.
Hua Huang commented in Jira Server:
[~522123], Oh, I see.
{quote} There are 2 possibilities,
We need to figure out if: the other extent (which is 0x100 bytes long) is allocated or not.
Mukund Bala Kanekar commented in Jira Server:
The other extents length 0x100 is derived based on the group 2 Descriptor in btree ( total free blocks are 0x300 and first extent is of length 0x200.)
There could be 2 possibilities,
Looking as code so far #1 looks to more accurate case as btree update order is, extent btree is updated first then group descriptor btree is updated later.
This issue/pull request has been marked as needs attention
as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.
Mukund Bala Kanekar commented in Jira Server:
Balloc Extent btree and Balloc Group Desc btree can go out of sync with following sequence of events,
To confirm sequence really occurs, debug build is created with,
The custom build details are as follows, http://cortx-storage.colo.seagate.com/releases/cortx/github/integration-custom-ci/rockylinux-8.4/custom-build-6911/
CORTX images are available at, cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-6911-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-6911-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-control:2.0.0-6911-custom-ci' cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-6911-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-6911-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-6911-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-control:2.0.0-6911-custom-ci
[~520414] can you please use above build for your experimentations and also share the steps you did before ending ending up in balloc assert.
Hua Huang commented in Jira Server:
[~522123], where is the patch (to build custom-build-6911)? So, maybe I try it on my setup too.
Mukund Bala Kanekar commented in Jira Server:
Sure, I have attached patch "balloc_debug_0623.txt" Also you can refer cortx-motr branch "CORTX-31907_balloc_debug" commit#d313462192ad7d406d0b27403091e2159ac58409
Mukund Bala Kanekar commented in Jira Server:
There was regression issue with previous custom build #6911 (dur to recent changes in rgw)
Created new custom build #6930 with other component commit #s from last sanity passed build #837 ([https://github.com/seagate/cortx/pkgs/container/cortx-rgw/26506950?tag=2.0.0-837)] (cortx-py-utils:31df3dd, cortx-motr:f9e83286, cortx-hare:f14ece8, cortx-provisioner:325a3e0b, cortx-ha:7a5d356, cortx-rgw-integration:17125bc, cortx-csm_agent:b827ce0d, cortx-rgw:g2694eb6c018, cortx-re:a81201a)
CORTX images are available at, cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-6930-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-6930-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-6930-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-control:2.0.0-6930-custom-ci
CFT Sanity is successful on custom build#6930, please refer CORTX-32425 for more details. [~520414] can you please use build#6930 for your experimentations and also share the steps you did before ending ending up in balloc assert.
This issue/pull request has been marked as needs attention
as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.
Mukund Bala Kanekar commented in Jira Server:
As part of clean up activity [https://github.com/Seagate/cortx-motr/tree/CORTX-31907_balloc_debug] branch is moved to forked repo [https://github.com/mukundkanekar/cortx-motr/tree/CORTX-31907_balloc_debug]
Mukund Bala Kanekar commented in Jira Server:
So far there was no luck while trying to reproduce on custom build.
[~530903] can you please share Jira or support bundle details if has seen recently ?
Chandradhar Raval commented in Jira Server:
Similar panic is also observed in Happy path Type-1 IOstability run with #869 build.
System details : Master : ssc-vm-g2-rhev4-3338 Client : ssc-vm-g3-rhev4-1104
Logs : Support bundle location: /root/deploy-scripts/k8_cortx_cloud/logs-cortx-cloud-2022-07-27_21-59.tar
cc : [~522123], [~520428], [~522059], [~531171]
Gaurav Kumar Gaur commented in Jira Server:
Sorry, [~530903], [~522123] I started redeploy and missed taking backup of logs-cortx-cloud-2022-07-27_21-59.tar. As deployment remove the whole directory the tar was also deleted.
Mukund Bala Kanekar commented in Jira Server:
Found below core dumps on 'sc-vm-rhev4-2478' vm {code:java} root@ssc-vm-rhev4-2478 ~]# ls -lrt /mnt/fs-local-volume/local-path-provisioner/pvc-555b588e-6fa1-4d0f-9c28-ae26cb9cd280_cortx_cortx-data-fs-local-pvc-ssc-vm-rhev4-2478/motr/m0d-0x7200000000000001:0x4 total 10812296 drwx------ 3 root root 4096 Jul 20 01:00 db drwx------ 3 root root 4096 Jul 20 01:00 db-log drwx------ 3 root root 4096 Jul 20 01:00 stobs -rw------- 1 root root 2545922048 Jul 27 19:02 core.1658970137.42 -rw------- 1 root root 1708761088 Jul 31 18:25 core.1659313526.42 -rw------- 1 root root 1027809280 Jul 31 18:25 core.1659313544.42 -rw------- 1 root root 1108475904 Jul 31 18:30 core.1659313803.42 -rw------- 1 root root 1599795200 Jul 31 18:48 core.1659314904.42 -rw------- 1 root root 994058240 Jul 31 18:48 core.1659314921.42 -rw------- 1 root root 1039974400 Jul 31 18:49 core.1659314947.42 -rw------- 1 root root 1148772352 Jul 31 18:52 core.1659315125.43 -rw------- 1 root root 1571958784 Jul 31 19:02 core.1659315731.42 -rw------- 1 root root 1511800832 Jul 31 19:06 core.1659316017.42 -rw------- 1 root root 1220747264 Jul 31 19:07 core.1659316048.43 -rw------- 1 root root 1340735488 Jul 31 19:13 core.1659316427.42 -rw------- 1 root root 1102131200 Jul 31 19:18 core.1659316682.42 -rw------- 1 root root 1187303424 Jul 31 19:21 core.1659316865.42 -rw------- 1 root root 1005858816 Jul 31 19:24 core.1659317098.42 -rw------- 1 root root 1164447744 Jul 31 19:33 core.1659317606.43 -rw------- 1 root root 1073897472 Jul 31 19:39 core.1659317977.42 -rw------- 1 root root 1336893440 Jul 31 19:49 core.1659318567.42 -rw------- 1 root root 1100767232 Jul 31 19:54 core.1659318889.42 -rw------- 1 root root 1060384768 Jul 31 20:00 core.1659319209.42{code}
Analyzed 'Jul 31 18:25 core.1659313526.42' further using gdb and build #869. {code:java} (gdb) bt
at stob/ad.c:1118
(gdb) f 4
1456 M0_ASSERT(false);{code}
2nd extent/fregment( e_start = 0xa0000,e_end = 0xc0000) under process belongs to next group(group 5) range. {code:java} (gdb) p/x ex $1 = {le_is_alloc = 0x0, le_link = {ll_next = 0x0, ll_prev = 0x0}, le_ext = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0xa0000, e_end = 0xc0000, e_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0x7d783919093e7f49}}} (gdb) p/x grp $2 = {bgi_state = 0x1, bgi_groupno = 0x4, bgi_normal = {bzp_type = 0x2000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0x80000, e_end = 0xa0000, e_footer = {ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x1fd00, bzp_fragments = 0x2, bzp_maxchunk = 0x1fd00, bzp_extents = {l_head = 0x7fea41bf59d8, l_tail = 0x7fea41bf59d8}}, bgi_spare = {bzp_type = 0x1000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0xa0000, e_end = 0xa0000, e_footer = { ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x0, bzp_fragments = 0x0, bzp_maxchunk = 0x0, bzp_extents = {l_head = 0x559bccf2aaa0, l_tail = 0x559bccf2aaa0}}, bgi_extents = 0x7fea41bf59d0, bgi_mutex = {bm_u = {mutex = {m_arch = {m_impl = {data = {lock = 0x1, count = 0x0, owner = 0x4b, nusers = 0x1, kind = 0x0, spins = 0x0, elision = 0x0, list = {prev = 0x0, next = 0x0}}, size = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4b, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 27 times>}, __align = 0x1}}, m_owner = 0x559bcc468f48, m_addb2 = 0x0}, pad = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4b, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 27 times>, 0x48, 0x8f, 0x46, 0xcc, 0x9b, 0x55, 0x0 <repeats 122 times>}}}} (gdb) p/x (grp+1) $3 = {bgi_state = 0x1, bgi_groupno = 0x5, bgi_normal = {bzp_type = 0x2000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0xa0000, e_end = 0xc0000, e_footer = {ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x20000, bzp_fragments = 0x1, bzp_maxchunk = 0x20000, bzp_extents = {l_head = 0x559bccf2abc0, l_tail = 0x559bccf2abc0}}, bgi_spare = {bzp_type = 0x1000, bzp_range = {e_header = {hd_magic = 0x0, hd_bits = 0x0}, e_start = 0xc0000, e_end = 0xc0000, e_footer = { ft_magic = 0x0, ft_checksum = 0x0}}, bzp_freeblocks = 0x0, bzp_fragments = 0x0, bzp_maxchunk = 0x0, bzp_extents = {l_head = 0x559bccf2ac20, l_tail = 0x559bccf2ac20}}, bgi_extents = 0x0, bgi_mutex = {bm_u = {mutex = {m_arch = {m_impl = {data = {lock = 0x0, count = 0x0, owner = 0x0, nusers = 0x0, kind = 0x0, spins = 0x0, elision = 0x0, list = {prev = 0x0, next = 0x0}}, size = {0x0 <repeats 40 times>}, __align = 0x0}}, m_owner = 0x0, m_addb2 = 0x0}, pad = {0x0 <repeats 168 times>}}}}{code} 1st fragment of group 4 has e_start = 0x80300,e_end = 0xa0000 (lenghth = 0x1fd00) {code:java} (gdb) p/x grp->bgi_normal.bzp_extents $4 = {l_head = 0x7fea41bf59d8, l_tail = 0x7fea41bf59d8} (gdb) p sizeof(struct m0_list_link) $5 = 16 (gdb) p/x (struct m0_ext*)(0x7fea41bf59d8+16) $6 = {e_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001100000020}, e_start = 0x80300, e_end = 0xa0000, e_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0x7132dfb6e48d980a}} (gdb) p i $7 = 1 (gdb) p 0xa0000-0x80300 $8 = 130304 (gdb) p/x 0xa0000-0x80300 $9 = 0x1fd00 (gdb) p normal_frags $10 = 1 (gdb){code}
Next step to extract group descriptor info for group 4 from group descriptor btree.
Mukund Bala Kanekar commented in Jira Server:
Extracted 'struct m0_balloc' pointer from frame 5 {code:java} (gdb) f 5
2556 rc = m0_balloc_load_extents(bac->bac_ctxt, grp); (gdb) p/x bac->bac_ctxt $12 = 0x400000126400{code}
Extracted group descriptor btree info, {code:java} (gdb) p/x ((struct m0_balloc )0x400000126400)->cb_db_group_desc $13 = {t_type = 0x0, t_height = 0x2, t_desc = 0x559bccb3bf90} (gdb) p/x ((struct td)0x559bccb3bf90) $14 = {t_type = 0x0, t_lock = {rw_lock = {data = {readers = 0x1, writers = 0x0, wrphase_futex = 0x1, writers_futex = 0x0, pad3 = 0x0, pad4 = 0x0, __cur_writer = 0x0, shared = 0x0, rwelision = 0x0, pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, pad2 = 0x0, flags = 0x0}, size = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 47 times>}, align = 0x1}}, t_root = 0x559bcc8e8960, t_height = 0x2, t_ref = 0x1, t_seg = 0x559bccb2a010, t_fid = { f_container = 0x6200000000000003, f_key = 0x8}, t_keycmp = {rko_keycmp = 0x0}} (gdb) p/x ((struct nd )0x559bcc8e8960) $15 = {n_addr = {as_core = 0x400000128400}, n_tree = 0x559bccb3bf90, n_type = 0x7fea74db16a0, n_seg = 0x0, n_linkage = {t_link = {ll_next = 0x559bcc8e6430, ll_prev = 0x559bccd3bf00}}, n_magic = 0x33c1a551c1dea77, n_lock = {rw_lock = {data = {readers = 0x1, writers = 0x0, wrphase_futex = 0x1, writers_futex = 0x0, pad3 = 0x0, pad4 = 0x0, __cur_writer = 0x0, shared = 0x0, rwelision = 0x0, pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, pad2 = 0x0, flags = 0x0}, size = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 47 times>}, align = 0x1}}, n_ref = 0x1, n_txref = 0x0, n_seq = 0x1707026bd8b427bf, n_op = 0x0, n_size = 0x1000, n_be_node_valid = 0x1} (gdb) p/x 0x400000128400+0x1000-8 $16 = 0x4000001293f8 (gdb) x/gx 0x4000001293f8 0x4000001293f8: 0x0000400000134050 (gdb) p/x ((struct ff_head )0x0000400000134050) $17 = {ff_fmt = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1000400000050}, ff_seg = {h_node_type = 0x1, h_tree_type = 0x0, h_crc_type = 0x0, h_gen = 0x1703776edc04cd94, h_fid = {f_container = 0x6200000000000003, f_key = 0x8}, h_opaque = 0x0}, ff_used = 0x1b, ff_level = 0x0, ff_ksize = 0x8, ff_vsize = 0x40, ff_nsize = 0xfb0, ff_foot = { ft_magic = 0x33f007e7f007e777, ft_checksum = 0x4db9c222d130d64c}, ff_opaque = 0x559bcc8f2a60}{code}
Extracted keys array details from group descriptor btree {code:java} (gdb) p/x sizeof(struct ff_head) $19 = 0x68 (gdb) x/32gx (0x0000400000134050+0x68) 0x4000001340b8: 0x0000000000000000 0x0100000000000000 0x4000001340c8: 0x0200000000000000 0x0300000000000000 0x4000001340d8: 0x0400000000000000 0x0500000000000000 0x4000001340e8: 0x0600000000000000 0x0700000000000000 0x4000001340f8: 0x0800000000000000 0x0900000000000000 0x400000134108: 0x0a00000000000000 0x0b00000000000000 0x400000134118: 0x0c00000000000000 0x0d00000000000000 0x400000134128: 0x0e00000000000000 0x0f00000000000000 0x400000134138: 0x1000000000000000 0x1100000000000000 0x400000134148: 0x1200000000000000 0x1300000000000000 0x400000134158: 0x1400000000000000 0x1500000000000000 0x400000134168: 0x1600000000000000 0x1700000000000000 0x400000134178: 0x1800000000000000 0x1900000000000000 0x400000134188: 0x1a00000000000000 0x0000000000000000 0x400000134198: 0x0000000000000000 0x0000000000000000 0x4000001341a8: 0x0000000000000000 0x0000000000000000{code}
Extracted value( group descriptor info for group 4)from group descriptor btree {code:java} (gdb) p/x ((struct m0_balloc_group_desc*)(0x0000400000134050+0xfb0))[-5] $20 = {bgd_header = {hd_magic = 0x33011ca5e511de77, hd_bits = 0x1001000000030}, bgd_groupno = 0x4, bgd_freeblocks = 0x1ff00, bgd_fragments = 0x2, bgd_maxchunk = 0x1fd00, bgd_footer = {ft_magic = 0x33f007e7f007e777, ft_checksum = 0xb756443e48350dd4}}{code} Thank you [~530902] for helping to gather information from new btree.
Summary : From group(4) descriptor info,
The trace files for the period before crash are not available, which could have helped to understand events which leads into this situation. {code:java} [root@ssc-vm-rhev4-2478 ~]# ls -lrt /mnt/fs-local-volume/local-path-provisioner/pvc-555b588e-6fa1-4d0f-9c28-ae26cb9cd280_cortx_cortx-da ta-fs-local-pvc-ssc-vm-rhev4-2478/log/motr/3e55246af7f018aaadc5b8cf68307d20/trace/m0d-0x7200000000000001\:0x4/ total 32896 -rwx------ 1 root root 16842752 Jul 31 19:59 m0trace.42.2022-08-01-01:59:56 -rwx------ 1 root root 16842752 Aug 2 00:55 m0trace.42.2022-08-01-02:05:19 [root@ssc-vm-rhev4-2478 ~]#{code}
Rohan Kulkarni commented in Jira Server:
Observed similar issue while testing IO stability type-1 workload on custom 7267 with dtm enabled build with IO stability resource limit changes. {noformat} 2022-08-05 02:16:10,928 - INFO - motr[00042]: c6b0 FATAL [lib/assert.c:50:m0_panic] panic: (0) at m0_balloc_load_extents() (balloc/balloc.c:1456) [git: TagTest3-1480-533-gbc4ec3ca] /etc/cortx/log/motr/700aeb181d045f419c9f48527a5e4cf0/trace/m0d-0x7200000000000001:0x1/m0trace.42.2022-08-05-02:08:46 2022-08-05 02:16:10,928 - INFO - Motr panic: (0) at m0_balloc_load_extents() balloc/balloc.c:1456 (errno: 0) (last failed: none) [git: TagTest3-1480-533-gbc4ec3ca] pid: 42 /etc/cortx/log/motr/700aeb181d045f419c9f48527a5e4cf0/trace/m0d-0x7200000000000001:0x1/m0trace.42.2022-08-05-02:08:46 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f38724ccf73] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f38724cd149] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_panic+0x13d)[0x7f38724bbb2d] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_balloc_load_extents+0x670)[0x7f38723c7a90] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x2bef28)[0x7f38723c7f28] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x48687d)[0x7f387258f87d] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x486bfc)[0x7f387258fbfc] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_be_emap_paste+0x4aa)[0x7f38723d4c3a] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x48a01e)[0x7f387259301e] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x48a189)[0x7f3872593189] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(m0_stob_punch+0xc0)[0x7f387259adc0] 2022-08-05 02:16:10,932 - INFO - /lib64/libmotr.so.2(+0x3a2469)[0x7f38724ab469] 2022-08-05 02:16:10,933 - INFO - /lib64/libmotr.so.2(+0x3a4087)[0x7f38724ad087] 2022-08-05 02:16:10,933 - INFO - /lib64/libmotr.so.2(+0x3833e8)[0x7f387248c3e8] 2022-08-05 02:16:10,933 - INFO - /lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7f38724c26ce] 2022-08-05 02:16:10,933 - INFO - /lib64/libmotr.so.2(+0x3c4e31)[0x7f38724cde31] 2022-08-05 02:16:10,933 - INFO - /lib64/libpthread.so.0(+0x815a)[0x7f3871b6f15a] 2022-08-05 02:16:10,933 - INFO - /lib64/libc.so.6(clone+0x43)[0x7f3870474dd3] 2022-08-05 02:16:12,048 - INFO - /usr/libexec/cortx-motr/motr-start: line 53: 42 Aborted (core dumped) /usr/libexec/cortx-motr/motr-server "$1" 2022-08-05 02:16:15,942 - INFO - motr[00042]: a2a0 ERROR [pool/pool_machine.c:632:m0_poolmach_state_transit] <! rc=-22 2022-08-05 02:16:15,943 - INFO - motr[00042]: a2a0 ERROR [pool/pool_machine.c:632:m0_poolmach_state_transit] <! rc=-22 2022-08-05 02:16:15,943 - INFO - motr[00042]: a2a0 ERROR [pool/pool_machine.c:632:m0_poolmach_state_transit] <! rc=-22{noformat} Log Location : http://cortx-storage.colo.seagate.com/logs1/6month-retention/CORTX-31907/
Rohan Kulkarni commented in Jira Server:
Observed similar panic in main build 869.
cortx-all:2.0.0-869 | service script version : v0.8.0 (Resource limits defined in F-99D)
Setup Configuration: Cluster: 5 Worker Node + 1 Master Node sns: 4+1+0 dix: 1+4+0
Workload executed for 315 hrs. More details can be found under https://seagate-systems.atlassian.net/wiki/spaces/PRIVATECOR/pages/1086750721/Test+Summary+PI8 {noformat} 2022-08-07 20:17:33,686 - INFO - motr[00042]: b010 FATAL [lib/assert.c:50:m0_panic] panic: (0) at m0_balloc_load_extents() (balloc/balloc.c:1456) [git: 2.0.0-859-19-g84fac5ca] /etc/cortx/log/motr/05b1077075a0511aeec88ec0f12b4acc/trace/m0d-0x7200000000000001:0x1/m0trace.42.2022-08-07-16:54:16 2022-08-07 20:17:33,686 - INFO - Motr panic: (0) at m0_balloc_load_extents() balloc/balloc.c:1456 (errno: 0) (last failed: none) [git: 2.0.0-859-19-g84fac5ca] pid: 42 /etc/cortx/log/motr/05b1077075a0511aeec88ec0f12b4acc/trace/m0d-0x7200000000000001:0x1/m0trace.42.2022-08-07-16:54:16 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7fd3ec327e13] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7fd3ec327fe9] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_panic+0x13d)[0x7fd3ec316a1d] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_balloc_load_extents+0x670)[0x7fd3ec221f10] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(+0x2bfcdf)[0x7fd3ec222cdf] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(+0x489b5e)[0x7fd3ec3ecb5e] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_stob_io_prepare+0x1bf)[0x7fd3ec3eff6f] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(m0_stob_io_prepare_and_launch+0x65)[0x7fd3ec3f0415] 2022-08-07 20:17:33,732 - INFO - /lib64/libmotr.so.2(+0x39ddf9)[0x7fd3ec300df9] 2022-08-07 20:17:33,733 - INFO - /lib64/libmotr.so.2(+0x39b4d3)[0x7fd3ec2fe4d3] 2022-08-07 20:17:33,733 - INFO - /lib64/libmotr.so.2(+0x3842d8)[0x7fd3ec2e72d8] 2022-08-07 20:17:33,733 - INFO - /lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7fd3ec31d56e] 2022-08-07 20:17:33,733 - INFO - /lib64/libmotr.so.2(+0x3c5cd1)[0x7fd3ec328cd1] 2022-08-07 20:17:33,733 - INFO - /lib64/libpthread.so.0(+0x815a)[0x7fd3eb9c915a] 2022-08-07 20:17:33,733 - INFO - /lib64/libc.so.6(clone+0x43)[0x7fd3ea2cedd3] 2022-08-07 20:17:36,761 - INFO - /usr/libexec/cortx-motr/motr-start: line 53: 42 Aborted (core dumped) /usr/libexec/cortx-motr/motr-server "$1" 2022-08-07 20:17:38,018 - INFO - Logger configured for phase=start{noformat} Log Location: [http://cortx-storage.colo.seagate.com/logs1/6month-retention/CORTX-31907/setup_2_869/]
Mukund Bala Kanekar commented in Jira Server:
The assert occurs while trying to load extents and discrepancies is seen between 2 balloc btrees, but thee is not enough trace/debug info available confirm how we ended up in this situation as the inconsistencies happened in past.
Custom build [#7478|https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/7478/] is created to assert early in possible places which could lead such discrepancies, this can help to confirm the RCA.
[~931947], [~522059] can we rerun the similar tests on one of the setup with Custom build [#7478|https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/7478/] to reproduce issues ?
CORTX images are available at, cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-7478-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-7478-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-7478-custom-ci cortx-docker.colo.seagate.com/seagate/cortx-control:2.0.0-7478-custom-ci
Priyanka Borawake commented in Jira Server:
Had restarted test after panic mentioned in above comment, Observing continuous container restart due to panic. Not able to continue further testing due to above. {noformat} [root@cortx-data-headless-svc-ssc-vm-rhev4-2385 m0d-0x7200000000000001:0x1]# ls -lrt total 7633676 drwx------ 3 root root 4096 Jul 25 10:00 db drwx------ 3 root root 4096 Jul 25 10:00 db-log drwx------ 3 root root 4096 Jul 25 10:00 stobs -rw------- 1 root root 1688387584 Aug 7 20:17 core.1659903453.42 -rw------- 1 root root 1433870336 Aug 7 20:18 core.1659903487.42 -rw------- 1 root root 1337831424 Aug 7 20:26 core.1659903996.43 -rw------- 1 root root 1042116608 Aug 7 20:29 core.1659904148.42 -rw------- 1 root root 1049464832 Aug 7 20:31 core.1659904277.42 -rw------- 1 root root 1044635648 Aug 7 20:33 core.1659904409.42 -rw------- 1 root root 1068785664 Aug 8 04:52 core.1659934366.42 -rw------- 1 root root 988745728 Aug 8 04:53 core.1659934401.42 -rw------- 1 root root 999981056 Aug 8 04:54 core.1659934458.42 -rw------- 1 root root 1146548224 Aug 8 05:06 core.1659935167.42 -rw------- 1 root root 1023770624 Aug 8 05:06 core.1659935188.42 -rw------- 1 root root 1083695104 Aug 8 07:59 core.1659945591.43 -rw------- 1 root root 1012453376 Aug 8 08:00 core.1659945606.42 -rw------- 1 root root 1089105920 Aug 8 08:02 core.1659945754.42 -rw------- 1 root root 1060470784 Aug 8 08:05 core.1659945909.42 -rw------- 1 root root 1048936448 Aug 8 08:07 core.1659946041.42 -rw------- 1 root root 1032396800 Aug 8 08:10 core.1659946211.43 -rw------- 1 root root 1149149184 Aug 8 08:15 core.1659946513.42{noformat} {noformat} [root@ssc-vm-rhev4-2381 k8_cortx_cloud]# kubectl get pods NAME READY STATUS RESTARTS AGE cortx-consul-client-4pf7f 1/1 Running 0 14d cortx-consul-client-b8bgr 1/1 Running 0 14d cortx-consul-client-jz84c 1/1 Running 0 14d cortx-consul-client-plsf2 1/1 Running 0 14d cortx-consul-client-x5t7b 1/1 Running 0 14d cortx-consul-server-0 1/1 Running 0 14d cortx-consul-server-1 1/1 Running 0 14d cortx-consul-server-2 1/1 Running 0 14d cortx-control-779d97c794-jpn8r 1/1 Running 0 14d cortx-data-ssc-vm-rhev4-2382-78f7cb9b6d-jx696 3/3 Running 3 (15h ago) 14d cortx-data-ssc-vm-rhev4-2383-58fdfdfffc-zzs5r 3/3 Running 2 (4d6h ago) 14d cortx-data-ssc-vm-rhev4-2384-6c9594f6d6-42tq8 3/3 Running 3 (18h ago) 14d cortx-data-ssc-vm-rhev4-2385-587f859595-zv4fr 3/3 Running 21 (106m ago) 14d cortx-data-ssc-vm-rhev4-2386-77478f8c5b-xqlbn 3/3 Running 3 (14h ago) 14d cortx-ha-74d8ff66db-zdwdr 3/3 Running 0 13d cortx-kafka-0 1/1 Running 1 (14d ago) 14d cortx-kafka-1 1/1 Running 0 14d cortx-kafka-2 1/1 Running 0 14d cortx-server-ssc-vm-rhev4-2382-69df9c4c4c-cbtg9 2/2 Running 0 14d cortx-server-ssc-vm-rhev4-2383-8c5579c6f-cd5hx 2/2 Running 0 14d cortx-server-ssc-vm-rhev4-2384-68749b8969-gtq7l 2/2 Running 0 14d cortx-server-ssc-vm-rhev4-2385-56d49fb5d5-5zgjc 2/2 Running 0 14d cortx-server-ssc-vm-rhev4-2386-bb6dfbbcc-24269 2/2 Running 0 14d cortx-zookeeper-0 1/1 Running 0 14d cortx-zookeeper-1 1/1 Running 0 14d cortx-zookeeper-2 1/1 Running 0 14d {noformat} logs attached : [Index of /logs1/6month-retention/CORTX-31907/setup_2_869_rerun/ (seagate.com)|http://cortx-storage.colo.seagate.com/logs1/6month-retention/CORTX-31907/setup_2_869_rerun/]
Priyanka Borawake commented in Jira Server:
Raising Severity to 1 as not able to continue further testing on setup.
Mukund Bala Kanekar commented in Jira Server:
Can we restart test on "Setup-2" with Custom build #7478 to reproduce issue.
Mukund Bala Kanekar commented in Jira Server:
Created new custom build [#7496|https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/7496/] The base is taken as main build #890, (Last known build in use for IO-Stability Testing).
Deployment is started on setup 2 using Jenkins job https://eos-jenkins.colo.seagate.com/view/QA-R2/job/QA/job/K8s_Cortx_Continuous_Deployment_sns_config/1469/
Thank you [~535790] for providing details.
Mukund Bala Kanekar commented in Jira Server:
Deployment of custom build #7496 succeeded on setup2 with below Jenkins job. https://eos-jenkins.colo.seagate.com/view/QA-R2/job/QA/job/K8s_Cortx_Continuous_Deployment_sns_config/1470/
Mukund Bala Kanekar commented in Jira Server:
Added assert when there is an intermediate return in alloc/free_db_update() due to error. This will avoid inconstancies in 2 balloc btree's to become persistent and will avoid multiple subsequent panic at m0_balloc_load_extents().
Gaurav Chaudhari commented in Jira Server:
{panel:bgColor=#c1c7d0}h2. motr - main branch build pipeline SUCCESS h3. Build Info:
Release Build : 5888
h3. Artifact Location :
h3. Image Location :
Mukund Bala Kanekar commented in Jira Server:
IO-Stability Test is running on custom build #7496 from last 2 days 21hrs (https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/367/ )
PR #2064 is merged, we will resolve current JIRA once we confirm that multiple subsequent panic at m0_balloc_load_extents() are no longer seen.
Mukund Bala Kanekar commented in Jira Server:
Custom build #7496 is running fine for A) last 6 days 21 hrs https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/367/ B) last 7 days https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/365/
Mukund Bala Kanekar commented in Jira Server:
Custom build #7496 is running fine for last 12+ days [https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/367/]
Mukund Bala Kanekar commented in Jira Server:
Custom build #7496 is running fine for last 13+ days [https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/367/]
Closing for now, the main build #895 and onwards has assert when there is an intermediate return in alloc/free_db_update() due to error.
So in case if issue is seen again in future we will have additional data to debug further.
Mukund Bala Kanekar commented in Jira Server:
Intermediate return in alloc/free_db_update() due to error(-2 (ENOENT)) was hit on build [2.0.0-916][v0.9.0].
Analysis,
Fix, Added separate flag to decide about group loaded vs not loaded status, and will avoid overlapping entries in extents list.
For more details please refer,
pods:
Such situation and state was reproduced while trying to reproduce https://github.com/Seagate/cortx-motr/issues/1838 issue.