StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

[Seg fault] Unable to start BE service #38703

Closed sachidanandacj closed 8 months ago

sachidanandacj commented 8 months ago

I am trying to start the BE service and it results in a Segmentation fault. I was able to start the FE successfully. I tried to start the BE on 2 nodes, one where the FE service is running and also on a different node but both result in a Seg fault.

Environment - OS - Ubuntu 22.04.3 Processor - x86

There is no indication of where is it failing

Steps to reproduce the behavior (Required)

  1. BE config file - added storage path and priority network. All the needed ports are free on the node.
$cat be/conf/be.conf

sys_log_level = INFO

be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

priority_networks = 192.168.110.192/24

storage_root_path = /data/be_storage
  1. Java installation
    $echo $JAVA_HOME
    /usr/lib/jvm/java-11-openjdk-amd64

I have installed openjdk-11-jdk

$javac -version
javac 11.0.21
  1. Starting the BE service -

    $./be/bin/start_be.sh 
    Segmentation fault (core dumped)
  2. Logs -

$cat be/log/be.out 
start time: Fri Jan 5 08:43:16 AM UTC 2024

Expected behavior (Required)

Run the BE service seamlessly

Real behavior (Required)

Starting a BE service results in Seg fault

StarRocks version (Required)

2.5.17

kangkaisen commented 8 months ago

please send us the be.warning and be.out message.

sachidanandacj commented 8 months ago
$cat be/log/be.out 
start time: Fri Jan 5 08:43:16 AM UTC 2024

The only log found in be/log/

kevincai commented 8 months ago
$cat be/log/be.out 
start time: Fri Jan 5 08:43:16 AM UTC 2024

The only log found in be/log/

how about the dmesg or syslog?

sachidanandacj commented 8 months ago
`
$tail -f /var/log/syslog
Jan 10 05:12:53 poc1 kernel: [421009.415182] starrocks_be[55344]: segfault at 8 ip 00007f98cddf8215 sp 00007ffeb9618900 error 4 in ld-linux-x86-64.so.2[7f98cddd7000+2a000]
Jan 10 05:12:53 poc1 kernel: [421009.415204] Code: 81 39 52 e5 74 64 0f 84 f9 04 00 00 48 85 c0 75 e4 48 83 3d 2c 7d 01 00 00 0f 85 0a 05 00 00 49 8b 57 68 49 8b 87 68 02 00 00 <48> 8b 5a 08 41 f6 87 1e 03 00 00 20 74 03 49 03 1f 48 85 c0 74 3a
`
kevincai commented 8 months ago

where did you get the StarRocks binaries for the ubuntu platform?

sachidanandacj commented 8 months ago

https://hub.docker.com/r/starrocks/artifacts-ubuntu/tags

kevincai commented 8 months ago

can you run ldd be/lib/starrocks_be, looks like dynamic library incompatible reading from the syslog.

sachidanandacj commented 8 months ago

I ran the ldd command and it gave no output (meaning no dependencies recognized?)

kevincai commented 8 months ago

it should have output.

should be something similar to the following output

$ ldd be/lib/starrocks_be 
    linux-vdso.so.1 (0x00007ffd35b93000)
    libjvm.so => not found
    libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007ff468b52000)
    libbfd-2.38-system.so => /lib/x86_64-linux-gnu/libbfd-2.38-system.so (0x00007ff4689da000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff4688f3000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff4686cb000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff468b6c000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff4686ad000)
sachidanandacj commented 8 months ago

I meant it was empty output. It did not show the dependencies. Could be some issue in the docker image? I need a classic cluster (shared-nothing). So, I downloaded 2.5.17 image. Can I try with 3.2.1 and still use shared-nothing cluster?

kevincai commented 8 months ago

I meant it was empty output. It did not show the dependencies. Could be some issue in the docker image? I need a classic cluster (shared-nothing). So, I downloaded 2.5.17 image. Can I try with 3.2.1 and still use shared-nothing cluster?

yes of couse, 3.x can still be running in shared-nothing mode.

sachidanandacj commented 8 months ago

let me try with that and will get back. Thanks for the support

kevincai commented 8 months ago

this is an example output of running starrocks/allin1-ubuntu:2.5.17, the same binary pulling from the starrocks/artifacts-ubuntu:2.5.17.

root@599a99355ec9:/data/deploy/starrocks# ldd be/lib/starrocks_be 
    linux-vdso.so.1 (0x00007ffc1a924000)
    libjvm.so => not found
    libbfd-2.38-system.so => /lib/x86_64-linux-gnu/libbfd-2.38-system.so (0x00007fd0a2d2a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd0a2c43000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd0a2a1b000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd0a2ea8000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd0a29ff000)
root@599a99355ec9:/data/deploy/starrocks# cd be
root@599a99355ec9:/data/deploy/starrocks/be# ./bin/show_be_version.sh 
2.5.17 RELEASE (build 3f669b66d9)
Built on 2023-12-19 05:53:45 by StarRocks@localhost
sachidanandacj commented 8 months ago

I got the docker image by docker pull starrocks/artifacts-ubuntu:2.5.17 You are using starrocks/allin1-ubuntu:2.5.17 Would that matter?

kevincai commented 8 months ago

I got the docker image by docker pull starrocks/artifacts-ubuntu:2.5.17 You are using starrocks/allin1-ubuntu:2.5.17 Would that matter?

the same, the binary inside allin1-ubuntu is copied from artifacts-ubuntu.

# md5sum be/lib/starrocks_be 
77ffe3354bafa7afbac4ce51dd354f94  be/lib/starrocks_be
sachidanandacj commented 8 months ago

Wow! Looks like a network transfer issue - On the node where i need to run the service

$ md5sum starrocks_be
0ac369b4439f8f8d56d165d754baeebe  starrocks_be

On the host machine where i have docker engine -

$ md5sum be/lib/starrocks_be
77ffe3354bafa7afbac4ce51dd354f94  be/lib/starrocks_be
kevincai commented 8 months ago

ok. looks like the binary is corrupted.

kevincai commented 8 months ago

addressed the root cause, close this issue.