Closed morningman closed 4 years ago
The destructor call chain is as follows: PlanFragmentExecutor |--- RuntimeState |--- RuntimeProfile |--- ObjectPool |--- NodeChannel |--- RowBatch |--- MemTracker->release() |--- profile->_consumption->add(-bytes)
Note that the RuntimeProfile will be first destructed in RuntimeState, which will cause the
_consumption
object that has been destructed to be called when destructing the RowBatch in the NodeChannel, which will eventually cause BE crash.MemTracker is created by https://github.com/apache/incubator-doris/blob/c6f2b5ef0df61902360a3f3fcdfc00a6fff86dfe/be/src/exec/tablet_sink.cpp#L461 Is the OlapTableSink::_mem_tracker related to RuntimeState::_profile? I think the RuntimeState::_instance_mem_tracker is relevant. Because it's OlapTableSink::_mem_tracker's parent.
It is my guess:
We met some errors in NodeChannel, make _cancelled == true
.
And NodeChannel's _cur_batch or _pending_batches
have some bytes.
Although some NodeChannels is cancelled internally, it isn't intolerable. OlapTableSink will use NodeChannel::mark_close()
.
But I write this
https://github.com/apache/incubator-doris/blob/7591527977ea8b4184e45581007cc805c461b451/be/src/exec/tablet_sink.cpp#L218-L222
So when cancelled internally
NodeChannels is under destruction, we will meet not empty
RowBatch.
But at that time, RuntimeState::_instance_mem_tracker has already deleted.
(RowBatch's mem_tracker = OlapTableSink::_mem_tracker. RuntimeState::_instance_mem_tracker is OlapTableSink::_mem_tracker's parent)
It's my fault. I used to treat OlapTableSink::_mem_tracker & _channels as internal members of OlapTableSink.
Is the OlapTableSink::_mem_tracker related to RuntimeState::_profile? I think the RuntimeState::_instance_mem_tracker is relevant. Because it's OlapTableSink::_mem_tracker's parent.
Actually, I didn't notice the relationship between OlapTableSink::_mem_tracker
and RuntimeState::_instance_mem_tracker
.
In this problem, I found that all memtrackers are relates to RuntimeState::_profile
, because it will update some Counters
in the RuntimeProfile
.
For example, when calling MemTracker->release()
, it will use (*tracker)->_consumption
, and the _consumption
is a kind of Counter
in RuntimeProfile
. But the profile
has already been deconstructed, which results in a use-after-free
error.
Describe the bug BE crash with coredump like:
Debug
In some abnormal situations, performing Insert to load will cause BE to crash. The reasons for troubleshooting are as follows:
https://github.com/apache/incubator-doris/blob/2211cb0ee0fcd23d4fd2445494aba6cf1a020987/fe/src/main/java/org/apache/doris/qe/Coordinator.java#L475-L489
During the execution of
execState.execRemoteFragmentAsync()
, if an RPC error occurs, if the corresponding BE is down, an exception will be thrown directly instead of returning the error viaFuture<PExecPlanFragmentResult>
. This time, the Coordinator will not proceed with the subsequentCancel
operation.After an exception is thrown, the thread stack returns to
handleInsertStmt()
and directly returns the user error message. Insert failed. So far, FE has no further processing.BE receives the execution plan of Insert and calls
_sink->open
inPlanFragmentExecutor::open_internal()
to openTabletSink
.https://github.com/apache/incubator-doris/blob/2211cb0ee0fcd23d4fd2445494aba6cf1a020987/be/src/runtime/plan_fragment_executor.cpp#L272-L292
The Open method of
TabletSink
will open all related NodeChannels via RPC, but because some BEs problem, some of NodeChannels fail to open. An error message appears in the BE log:However, because the majority of NodeChannels were successfully opened, TabletSink Open operations returned success.
Next, in
PlanFragmentExecutor::open_internal()
will callget_next_internal()
to start fetching data, because Insert already failed at this time, so it returns failure here. And there are bugs in the laterupdate_status(status)
method:https://github.com/apache/incubator-doris/blob/2211cb0ee0fcd23d4fd2445494aba6cf1a020987/be/src/runtime/plan_fragment_executor.cpp#L485-L503
Line 493:
if (_status.ok())
should beif (!_status.ok())
. This error caused the_status
variable not to be updated. This will cause the NodeChannel to be closed instead of being canceled when the TabletSink is finally closed.The normal NodeChannel close operation will send the last batch of RowBatch it holds and destroy the RowBatch object. And because some NodeChannels were not opened normally, they will not be closed normally. The RowBatch held by these NodeChannels will not be destroyed.
After the execution of the entire plan is completed, the PlanFragmentExecutor's destruction process is entered. The destructor call chain is as follows:
Note that the RuntimeProfile will be first destructed in RuntimeState, which will cause the
_consumption
object that has been destructed to be called when destructing the RowBatch in the NodeChannel, which will eventually cause BE crash.The whole process has the following problems:
update_status(status)
caused the NodeChannel to not be canceled correctly, which caused problems in the final destruction. (If Cancel of NodeChannel is called in advance, RowBatch will be destructed in advance).execRemoteFragmentAsync()
, if it finds an RPC error, it should return a Future with an error code, continue the process afterwards, and actively call Cancel()._status
in RuntimeState has no lock protection, which may cause some potential problems.