Open ming12713 opened 3 days ago
In my Case, Kafka Writes to Doris via Connector Sink Mode, When Doris is restarted, the connector continues to write data. The logs parse the coordinator BE IP. Is it possible that the connector is using the StreamLoad method to write data? This data is synchronized to the FE meta with bdb, but it has not yet been synchronized to the BE. If the BE is restarted at this moment, the FE may negotiate a BE coordinator IP that it cannot connect to, causing cluster issues. Is my understanding correct?
Search before asking
Version
2.1.1
What's Wrong?
2024-11-12 08:05:14,342 INFO (stateListener|83) [DatabaseTransactionMgr.replayUpsertTransactionState():2158] replay a COMMITTED transaction TransactionState. transaction id: 3917246, label: nome_raw_dataKC_ods_vtc_nome_raw_data__KC_1KC_loshu_ods_vtc_nome_raw_dataKC_0__KC_1612525KC_1730487305321, db id: 11154, table id list: 74966, callback id: -1, coordinator: BE: 10.42.1.19, transaction status: COMMITTED, error replicas num: 0, replica ids: , prepare time: 1730487305400, commit time: 1730487308568, finish time: -1, reason: /opt/apache-doris/fe/bin/start_fe.sh: line 265: 162 Killed ${LIMIT:+${LIMIT}} "${JAVA}" ${final_java_opt:+${final_java_opt}} -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError="kill -9 %p" ${coverage_opt:+${coverage_opt}} org.apache.doris.DorisFE ${HELPER:+${HELPER}} ${OPT_VERSION:+${OPT_VERSION}} "${METADATA_FAILURE_RECOVERY}" "$@" < /dev/null
Doris Installation via Operator, 1 BE Node and 1 FE Node, After restarting both the Doris FE and BE nodes, the FE node fails to start normally and reports the error mentioned above. The BE IP 10.42.1.19 mentioned in the error is the previous BE pod IP, not the SVC IP. The FE configuration for service discovery is set to use SVC (Service) method, but now the BE is 10.42.1.6.
pod network cidr 10.42.1.x/16
svc network cidr 10.43.48.x
What You Expected?
fixe issues
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct