[Bug] Doris-BE node dead and fe node alived

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

https://doris.apache.org

Apache License 2.0

12.34k stars 3.21k forks source link

[Bug] Doris-BE node dead and fe node alived #30207

Closed dzygoon closed 7 months ago

dzygoon commented 8 months ago

Search before asking

[X] I had searched in the issues and found no similar issues.

Version

2.0.0

What's Wrong?

My test environment consisted of a single machine with a BE node and an FE node,The configuration is 16C+32 GB,When I use the ETL tool to run a single table for 20GB of data file into the database,The BE node is down and I do not see any error message.

When I checked the past history log, I found the following error（INFO.log）：

(be.WARNING.log)

I also found that fe nodes could not be found for some time...

What You Expected?

I want to know the cause of BE node downtime, how to solve these problems.

How to Reproduce?

none.

Anything Else?

I see similar problems in the community, but there are no answers.

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Vallishp commented 8 months ago

can you pls try these ? 1) May be can you please check the status of backend after the start of be using show backends\G (its on query port using mysql)

2) Check any core dump file present in the env.

3) may be try run again test case, with enabling be debug log. https://doris.apache.org/docs/1.2/advanced/best-practice/debug-log/

dzygoon commented 8 months ago

can you pls try these ?

May be can you please check the status of backend after the start of be using show backends\G (its on query port using mysql)

Check any core dump file present in the env.

may be try run again test case, with enabling be debug log. https://doris.apache.org/docs/1.2/advanced/best-practice/debug-log/

thanks for your reply. 1.My be node can not be started.When I first found out it was down,I tried to boot it once, it was bootable then, and I was connected via fe,But then it went down again,And then I can't start the be node anymore. 2.What is core dump file?Sorry,I don't understand. 3.I will try to turn on a higher level log for monitoring.

Vallishp commented 8 months ago

oh ok. then sh start_be.sh --console might be helpful, you might get some error in the console itself, from that we can get a clue

dzygoon commented 8 months ago

oh ok. then sh start_be.sh --console might be helpful, you might get some error in the console itself, from that we can get a clue

When i run start_be.sh,I will get shell command error message from console.Now i give up and redeploy be node.

dzygoon commented 8 months ago

oh ok. then sh start_be.sh --console might be helpful, you might get some error in the console itself, from that we can get a clue

When i run start_be.sh,I will get shell command error message from console.Now i give up and redeploy be node.

I want to know what is the wrong reason for deleting files?

Vallishp commented 8 months ago

Can you please share screenshot or log info more?

dzygoon commented 8 months ago

Can you please share screenshot or log info more?

sure. This is be.WARNING.log partial screenshot. 9650af123635d0650fb7f275f6655fd

Error msg too many... If be node bad again,I will send more error msg.