StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.04k stars 1.82k forks source link

[BugFix] Fix routine load job change to cancelled state when txn expired (backport #50334) #53036

Closed mergify[bot] closed 2 days ago

mergify[bot] commented 2 days ago

Why I'm doing:

when runs routineLoadTaskInfo.createRoutineLoadTask got txn timeout cleared, routine load job unexpectly changed to cancelled state

fe master log

cat fe.log.20240712-19 | grep 875661697
2024-07-12 00:24:46,531 INFO (pool-17-thread-6|19347) [DatabaseTransactionMgr.beginTransaction():315] begin transaction: txn_id: 875661697 with label ab771dba-fb73-4b94-946c-99f3e658afe0 from coordinator FE: 9.146.113.48, listner id: 10279645
2024-07-12 00:28:06,649 INFO (txnTimeoutChecker|95) [DatabaseTransactionMgr.abortTransaction():1403] transaction:[TransactionState. txn_id: 875661697, label: ab771dba-fb73-4b94-946c-99f3e658afe0, db id: 4569082, table id list: 10279545, callback id: 10279645, coordinator: FE: 9.146.113.48, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1720715086531, commit time: -1, finish time: 1720715286565, total cost: 200034ms, reason: timeout by txn manager] successfully rollback
2024-07-12 00:28:06,649 INFO (txnTimeoutChecker|95) [DatabaseTransactionMgr.abortTimeoutTxns():1728] transaction [875661697] is timeout, abort it by transaction manager
2024-07-12 00:52:57,967 INFO (LoadLabelCleaner|94) [DatabaseTransactionMgr.removeExpiredTxns():1526] transaction list [875616787, 875616775, 875616783, 875616774, 875616796, 875616781, 875616797, 875616780, 875616792, 875616745, 875616798, 875616779, 875616795, 875616778, 875616777, 875616793, 875616784, 875616782, 875616794, 875627091, 875627093, 875627090, 875629065, 875630934, 875623227, 875623228, 875627099, 875627101, 875627106, 875627105, 875627100, 875627104, 875627103, 875629532, 875630935, 875629695, 875632199, 875627118, 875627117, 875627107, 875627109, 875627108, 875628059, 875628066, 875628063, 875627116, 875627120, 875627119, 875627113, 875627111, 875627112, 875627115, 875629696, 875625839, 875625840, 875632198, 875627887, 875627884, 875632737, 875622705, 875633236, 875629584, 875629583, 875629594, 875624096, 875624060, 875624122, 875624199, 875624200, 875624227, 875624226, 875624248, 875624260, 875624275, 875624285, 875624306, 875624302, 875624324, 875624314, 875624320, 875624352, 875624368, 875624350, 875624344, 875624354, 875624376, 875624400, 875624371, 875624405, 875624399, 875641015, 875641361, 875641360, 875641398, 875641420, 875641423, 875641588, 875641727, 875634573, 875642731, 875626177, 875625849, 875644134, 875644345, 875644805, 875644811, 875645220, 875646634, 875646641, 875646643, 875646644, 875647225, 875647226, 875647228, 875647230, 875647250, 875647895, 875628323, 875650723, 875651166, 875651684, 875651686, 875651874, 875651892, 875656699, 875659230, 875661489, 875661697, 875666518, 875670414, 875671662, 875671663, 875671664, 875671665, 875671667, 875671666, 875671669, 875671668, 875671670, 875671671, 875671672, 875671673, 875671674, 875676590, 875677830, 875677843, 875681236, 875682726, 875685688, 875685689, 875686918, 875690603, 875691618, 875691833, 875695126, 875695595, 875696274, 875696629, 875699210, 875699211, 875701454, 875704125, 875705372, 875705374, 875709556, 875714037, 875714510, 875714999, 875717359, 875718755, 875718807, 875719020, 875721921, 875721922, 875723151, 875724388, 875727135, 875727321, 875729783, 875729784, 875733551, 875734793, 875734794, 875735518, 875738521, 875739769, 875743344, 875620971, 875620973, 875620937, 875620572, 875620564, 875620574, 875620565, 875620560, 875620569, 875620559, 875620578, 875620563, 875620945, 875620562, 875620987, 875620972, 875620985, 875620567, 875620570, 875620568, 875620573, 875620571, 875620953, 875620576, 875620911, 875620949, 875620988, 875744258, 875745407, 875747984, 875748913, 875753464, 875753465, 875755483, 875757131, 875759485, 875763014, 875763419, 875765688, 875768752, 875768858, 875774268, 875778939, 875783482, 875783651, 875786253, 875791377, 875792619, 875795102, 875796138, 875799870, 875800071, 875801313, 875804967, 875805411, 875805665, 875807986, 875812984, 875814127, 875814743, 875815490, 875819217, 875819993, 875821716, 875823723, 875824096, 875826134, 875826265, 875826847, 875830686, 875832154, 875832155, 875835879, 875836666, 875839604, 875841601, 875843337, 875845683, 875848086, 875848279, 875849053, 875853273, 875854515] are expired, remove them from transaction manager
2024-07-12 01:00:11,145 INFO (pool-17-thread-6|19347) [RoutineLoadJob.unprotectUpdateState():1160] ROUTINE_LOAD_JOB=10279645, current_job_state={RUNNING}, desire_job_state={CANCELLED}, msg={ErrorReason{errCode = 7, msg='meta not found: txn does not exist: 875661697'}}
2024-07-12 01:00:11,145 WARN (pool-17-thread-6|19347) [RoutineLoadJob.unprotectUpdateState():1184] routine load job 10279645-rl_playerfriendslist_fht0 changed to CANCELLED with reason: ErrorReason{errCode = 7, msg='meta not found: txn does not exist: 875661697'}
com.starrocks.common.MetaNotFoundException: txn does not exist: 875661697
2024-07-12 01:12:45,457 INFO (leaderCheckpointer|1389) [DatabaseTransactionMgr.removeExpiredTxns():1517] transaction list [, 875620657, 875623198, 875624379, 875620990, 875620989, 875623205, 875616715, 875620932, 875623257, 875621988, 875625838, 875623251, 875621842, 875620861, 875624558, 875624391, 875624171, 875621841, 875626179, 875622666, 875622665, 875622672, 875620683, 875620679, 875620682, 875620676, 875623248, 875623237, 875627095, 875627096, 875627094, 875627092, 875621026, 875620994, 875620996, 875620992, 875620995, 875621041, 875620983, 875620993, 875621987, 875621986, 875622707, 875622745, 875625843, 875625844, 875627789, 875623261, 875625826, 875624549, 875616733, 875620850, 875620746, 875620688, 875620686, 875620692, 875620690, 875620691, 875620728, 875620687, 875620689, 875620863, 875620716, 875622720, 875622813, 875622864, 875622855, 875625833, 875625827, 875625830, 875625828, 875625829, 875625832, 875625831, 875625842, 875627886, 875627102, 875624463, 875625841, 875628448, 875629699, 875615184, 875615306, 875615305, 875620599, 875620622, 875620621, 875620596, 875623057, 875623127, 875623138, 875623188, 875623226, 875628394, 875629697, 875623628, 875623656, 875624229, 875623637, 875625845, 875625846, 875625847, 875625850, 875625848, 875625853, 875625851, 875625852, 875626140, 875626119, 875626134, 875616787, 875616775, 875616783, 875616774, 875616796, 875616781, 875616797, 875616780, 875616792, 875616745, 875616798, 875616779, 875616795, 875616778, 875616777, 875616793, 875616784, 875616782, 875616794, 875627091, 875627093, 875627090, 875629065, 875630934, 875623227, 875623228, 875627099, 875627101, 875627106, 875627105, 875627100, 875627104, 875627103, 875629532, 875630935, 875629695, 875632199, 875627118, 875627117, 875627107, 875627109, 875627108, 875628059, 875628066, 875628063, 875627116, 875627120, 875627119, 875627113, 875627111, 875627112, 875627115, 875629696, 875625839, 875625840, 875632198, 875627887, 875627884, 875632737, 875622705, 875633236, 875629584, 875629583, 875629594, 875624096, 875624060, 875624122, 875624199, 875624200, 875624227, 875624226, 875624248, 875624260, 875624275, 875624285, 875624306, 875624302, 875624324, 875624314, 875624320, 875624352, 875624368, 875624350, 875624344, 875624354, 875624376, 875624400, 875624371, 875624405, 875624399, 875641015, 875641361, 875641360, 875641398, 875641420, 875641423, 875641588, 875641727, 875634573, 875642731, 875626177, 875625849, 875644134, 875644345, 875644805, 875644811, 875645220, 875646634, 875646641, 875646643, 875646644, 875647225, 875647226, 875647228, 875647230, 875647250, 875647895, 875628323, 875650723, 875651166, 875651684, 875651686, 875651874, 875651892, 875656699, 875659230, 875661489, 875661697, 875666518, 875670414, 875671662, 875671663, 875671664, 875671665, 875671667, 875671666, 875671669, 875671668, 875671670, 875671671, 875671672, 875671673, 875671674, 875676590, 875677830, 875677843, 875681236, 875682726, 875685688, 875685689, 875686918, 875690603, 875691618, 875691833, 875695126, 875695595, 875696274, 875696629, 875699210, 875699211, 875701454, 875704125, 875705372, 875705374, 875709556, 875714037, 875714510, 875714999, 875717359, 875718755, 875718807, 875719020, 875721921, 875721922, 875723151, 875724388, 875727135, 875727321, 875729783, 875729784, 875733551, 875734793, 875734794, 875735518, 875738521, 875739769, 875743344, 875620971, 875620973, 875620937, 875620572, 875620564, 875620574, 875620565, 875620560, 875620569, 875620559, 875620578, 875620563, 875620945, 875620562, 875620987, 875620972, 875620985, 875620567, 875620570, 875620568, 875620573, 875620571, 875620953, 875620576, 875620911, 875620949, 875620988, 875744258, 875745407, 875747984, 875748913, 875753464, 875753465, 875755483, 875757131, 875759485, 875763014, 875763419, 875765688, 875768752, 875768858, 875774268, 875778939, 875783482, 875783651, 875786253, 875791377, 875792619, 875795102, 875796138, 875799870, 875800071, 875801313, 875804967, 875805411, 875805665, 875807986, 875812984, 875814127, 875814743, 875815490, 875819217, 875819993, 875821716, 875823723, 875824096, 875826134, 875826265, 875826847, 875830686, 875832154, 875832155, 875835879, 875836666, 875839604, 875841601] are expired, remove them from transaction manager

bug happened time order

What I'm doing:

change routine load job to cancelled state only when exception message is db does not exist or table does not exist

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

when runs routineLoadTaskInfo.createRoutineLoadTask got txn timeout cleared, routine load job unexpectly changed to cancelled state

fe master log

cat fe.log.20240712-19 | grep 875661697
2024-07-12 00:24:46,531 INFO (pool-17-thread-6|19347) [DatabaseTransactionMgr.beginTransaction():315] begin transaction: txn_id: 875661697 with label ab771dba-fb73-4b94-946c-99f3e658afe0 from coordinator FE: 9.146.113.48, listner id: 10279645
2024-07-12 00:28:06,649 INFO (txnTimeoutChecker|95) [DatabaseTransactionMgr.abortTransaction():1403] transaction:[TransactionState. txn_id: 875661697, label: ab771dba-fb73-4b94-946c-99f3e658afe0, db id: 4569082, table id list: 10279545, callback id: 10279645, coordinator: FE: 9.146.113.48, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1720715086531, commit time: -1, finish time: 1720715286565, total cost: 200034ms, reason: timeout by txn manager] successfully rollback
2024-07-12 00:28:06,649 INFO (txnTimeoutChecker|95) [DatabaseTransactionMgr.abortTimeoutTxns():1728] transaction [875661697] is timeout, abort it by transaction manager
2024-07-12 00:52:57,967 INFO (LoadLabelCleaner|94) [DatabaseTransactionMgr.removeExpiredTxns():1526] transaction list [875616787, 875616775, 875616783, 875616774, 875616796, 875616781, 875616797, 875616780, 875616792, 875616745, 875616798, 875616779, 875616795, 875616778, 875616777, 875616793, 875616784, 875616782, 875616794, 875627091, 875627093, 875627090, 875629065, 875630934, 875623227, 875623228, 875627099, 875627101, 875627106, 875627105, 875627100, 875627104, 875627103, 875629532, 875630935, 875629695, 875632199, 875627118, 875627117, 875627107, 875627109, 875627108, 875628059, 875628066, 875628063, 875627116, 875627120, 875627119, 875627113, 875627111, 875627112, 875627115, 875629696, 875625839, 875625840, 875632198, 875627887, 875627884, 875632737, 875622705, 875633236, 875629584, 875629583, 875629594, 875624096, 875624060, 875624122, 875624199, 875624200, 875624227, 875624226, 875624248, 875624260, 875624275, 875624285, 875624306, 875624302, 875624324, 875624314, 875624320, 875624352, 875624368, 875624350, 875624344, 875624354, 875624376, 875624400, 875624371, 875624405, 875624399, 875641015, 875641361, 875641360, 875641398, 875641420, 875641423, 875641588, 875641727, 875634573, 875642731, 875626177, 875625849, 875644134, 875644345, 875644805, 875644811, 875645220, 875646634, 875646641, 875646643, 875646644, 875647225, 875647226, 875647228, 875647230, 875647250, 875647895, 875628323, 875650723, 875651166, 875651684, 875651686, 875651874, 875651892, 875656699, 875659230, 875661489, 875661697, 875666518, 875670414, 875671662, 875671663, 875671664, 875671665, 875671667, 875671666, 875671669, 875671668, 875671670, 875671671, 875671672, 875671673, 875671674, 875676590, 875677830, 875677843, 875681236, 875682726, 875685688, 875685689, 875686918, 875690603, 875691618, 875691833, 875695126, 875695595, 875696274, 875696629, 875699210, 875699211, 875701454, 875704125, 875705372, 875705374, 875709556, 875714037, 875714510, 875714999, 875717359, 875718755, 875718807, 875719020, 875721921, 875721922, 875723151, 875724388, 875727135, 875727321, 875729783, 875729784, 875733551, 875734793, 875734794, 875735518, 875738521, 875739769, 875743344, 875620971, 875620973, 875620937, 875620572, 875620564, 875620574, 875620565, 875620560, 875620569, 875620559, 875620578, 875620563, 875620945, 875620562, 875620987, 875620972, 875620985, 875620567, 875620570, 875620568, 875620573, 875620571, 875620953, 875620576, 875620911, 875620949, 875620988, 875744258, 875745407, 875747984, 875748913, 875753464, 875753465, 875755483, 875757131, 875759485, 875763014, 875763419, 875765688, 875768752, 875768858, 875774268, 875778939, 875783482, 875783651, 875786253, 875791377, 875792619, 875795102, 875796138, 875799870, 875800071, 875801313, 875804967, 875805411, 875805665, 875807986, 875812984, 875814127, 875814743, 875815490, 875819217, 875819993, 875821716, 875823723, 875824096, 875826134, 875826265, 875826847, 875830686, 875832154, 875832155, 875835879, 875836666, 875839604, 875841601, 875843337, 875845683, 875848086, 875848279, 875849053, 875853273, 875854515] are expired, remove them from transaction manager
2024-07-12 01:00:11,145 INFO (pool-17-thread-6|19347) [RoutineLoadJob.unprotectUpdateState():1160] ROUTINE_LOAD_JOB=10279645, current_job_state={RUNNING}, desire_job_state={CANCELLED}, msg={ErrorReason{errCode = 7, msg='meta not found: txn does not exist: 875661697'}}
2024-07-12 01:00:11,145 WARN (pool-17-thread-6|19347) [RoutineLoadJob.unprotectUpdateState():1184] routine load job 10279645-rl_playerfriendslist_fht0 changed to CANCELLED with reason: ErrorReason{errCode = 7, msg='meta not found: txn does not exist: 875661697'}
com.starrocks.common.MetaNotFoundException: txn does not exist: 875661697
2024-07-12 01:12:45,457 INFO (leaderCheckpointer|1389) [DatabaseTransactionMgr.removeExpiredTxns():1517] transaction list [, 875620657, 875623198, 875624379, 875620990, 875620989, 875623205, 875616715, 875620932, 875623257, 875621988, 875625838, 875623251, 875621842, 875620861, 875624558, 875624391, 875624171, 875621841, 875626179, 875622666, 875622665, 875622672, 875620683, 875620679, 875620682, 875620676, 875623248, 875623237, 875627095, 875627096, 875627094, 875627092, 875621026, 875620994, 875620996, 875620992, 875620995, 875621041, 875620983, 875620993, 875621987, 875621986, 875622707, 875622745, 875625843, 875625844, 875627789, 875623261, 875625826, 875624549, 875616733, 875620850, 875620746, 875620688, 875620686, 875620692, 875620690, 875620691, 875620728, 875620687, 875620689, 875620863, 875620716, 875622720, 875622813, 875622864, 875622855, 875625833, 875625827, 875625830, 875625828, 875625829, 875625832, 875625831, 875625842, 875627886, 875627102, 875624463, 875625841, 875628448, 875629699, 875615184, 875615306, 875615305, 875620599, 875620622, 875620621, 875620596, 875623057, 875623127, 875623138, 875623188, 875623226, 875628394, 875629697, 875623628, 875623656, 875624229, 875623637, 875625845, 875625846, 875625847, 875625850, 875625848, 875625853, 875625851, 875625852, 875626140, 875626119, 875626134, 875616787, 875616775, 875616783, 875616774, 875616796, 875616781, 875616797, 875616780, 875616792, 875616745, 875616798, 875616779, 875616795, 875616778, 875616777, 875616793, 875616784, 875616782, 875616794, 875627091, 875627093, 875627090, 875629065, 875630934, 875623227, 875623228, 875627099, 875627101, 875627106, 875627105, 875627100, 875627104, 875627103, 875629532, 875630935, 875629695, 875632199, 875627118, 875627117, 875627107, 875627109, 875627108, 875628059, 875628066, 875628063, 875627116, 875627120, 875627119, 875627113, 875627111, 875627112, 875627115, 875629696, 875625839, 875625840, 875632198, 875627887, 875627884, 875632737, 875622705, 875633236, 875629584, 875629583, 875629594, 875624096, 875624060, 875624122, 875624199, 875624200, 875624227, 875624226, 875624248, 875624260, 875624275, 875624285, 875624306, 875624302, 875624324, 875624314, 875624320, 875624352, 875624368, 875624350, 875624344, 875624354, 875624376, 875624400, 875624371, 875624405, 875624399, 875641015, 875641361, 875641360, 875641398, 875641420, 875641423, 875641588, 875641727, 875634573, 875642731, 875626177, 875625849, 875644134, 875644345, 875644805, 875644811, 875645220, 875646634, 875646641, 875646643, 875646644, 875647225, 875647226, 875647228, 875647230, 875647250, 875647895, 875628323, 875650723, 875651166, 875651684, 875651686, 875651874, 875651892, 875656699, 875659230, 875661489, 875661697, 875666518, 875670414, 875671662, 875671663, 875671664, 875671665, 875671667, 875671666, 875671669, 875671668, 875671670, 875671671, 875671672, 875671673, 875671674, 875676590, 875677830, 875677843, 875681236, 875682726, 875685688, 875685689, 875686918, 875690603, 875691618, 875691833, 875695126, 875695595, 875696274, 875696629, 875699210, 875699211, 875701454, 875704125, 875705372, 875705374, 875709556, 875714037, 875714510, 875714999, 875717359, 875718755, 875718807, 875719020, 875721921, 875721922, 875723151, 875724388, 875727135, 875727321, 875729783, 875729784, 875733551, 875734793, 875734794, 875735518, 875738521, 875739769, 875743344, 875620971, 875620973, 875620937, 875620572, 875620564, 875620574, 875620565, 875620560, 875620569, 875620559, 875620578, 875620563, 875620945, 875620562, 875620987, 875620972, 875620985, 875620567, 875620570, 875620568, 875620573, 875620571, 875620953, 875620576, 875620911, 875620949, 875620988, 875744258, 875745407, 875747984, 875748913, 875753464, 875753465, 875755483, 875757131, 875759485, 875763014, 875763419, 875765688, 875768752, 875768858, 875774268, 875778939, 875783482, 875783651, 875786253, 875791377, 875792619, 875795102, 875796138, 875799870, 875800071, 875801313, 875804967, 875805411, 875805665, 875807986, 875812984, 875814127, 875814743, 875815490, 875819217, 875819993, 875821716, 875823723, 875824096, 875826134, 875826265, 875826847, 875830686, 875832154, 875832155, 875835879, 875836666, 875839604, 875841601] are expired, remove them from transaction manager

bug happened time order

What I'm doing:

change routine load job to cancelled state only when exception message is db does not exist or table does not exist

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

mergify[bot] commented 2 days ago

Cherry-pick of c68222ccf477345a815468afa6c179e64c267844 has failed:

On branch mergify/bp/branch-3.1/pr-50334
Your branch is up to date with 'origin/branch-3.1'.

You are currently cherry-picking commit c68222ccf4.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
    modified:   fe/fe-core/src/main/java/com/starrocks/load/routineload/RoutineLoadJob.java
    modified:   fe/fe-core/src/main/java/com/starrocks/load/routineload/RoutineLoadTaskScheduler.java
    modified:   fe/fe-core/src/test/java/com/starrocks/load/routineload/RoutineLoadTaskSchedulerTest.java

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
    deleted by us:   fe/fe-core/src/test/java/com/starrocks/load/routineload/RoutineLoadJobMetaTest.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify[bot] commented 2 days ago

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

sonarcloud[bot] commented 2 days ago

Quality Gate Failed Quality Gate failed

Failed conditions
E Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE