StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.99k stars 1.81k forks source link

StarRocks BE component Resource temporarily unavailable on startup Kind Kubernets #52771

Open crabio opened 4 days ago

crabio commented 4 days ago

Steps to reproduce the behavior (Required)

Install Star Rocks DB on local Podman Kind Kubernetes via instruction: https://docs.starrocks.io/docs/quick_start/helm/

Expected behavior (Required)

Star Rocks DB BE starts fine as well as FE.

Real behavior (Required)

FE component starts well, bug BE component failed on HTTP server start. log:

start time: Mon Nov 11 04:03:42 CST 2024, server uptime:  04:03:42 up  8:43,  0 users,  load average: 61.87, 43.07, 26.89
Ignored unknown config: default_rowset_type
terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
3.3.5 RELEASE (build 6d81f75)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 168683676
tracker:jemalloc_metadata consumption: 1907664
tracker:jemalloc_fragmentation consumption: 4133772
tracker:query_pool consumption: 0
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 0
tracker:tablet_metadata consumption: 0
tracker:rowset_metadata consumption: 0
tracker:segment_metadata consumption: 0
tracker:column_metadata consumption: 0
tracker:tablet_schema consumption: 0
tracker:segment_zonemap consumption: 0
tracker:short_key_index consumption: 0
tracker:column_zonemap_index consumption: 0
tracker:ordinal_index consumption: 0
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:jit_cache consumption: 0
tracker:update consumption: 0
tracker:chunk_allocator consumption: 0
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1731269023 (unix time) try "date -d @1731269023" if you are using GNU date ***
PC: @     0xffff9006f200 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x7f1ff)
*** SIGABRT (@0x1f) received by PID 31 (TID 0xffff8fd7f040) from PID 31; stack trace: ***
    @     0xffff900725d4 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x825d3)
    @          0x965a808 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0xffff913537f0 ([vdso]+0x7ef)
    @     0xffff9006f200 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x7f1ff)
    @     0xffff9002a67c raise
    @     0xffff90017130 abort
    @          0xd70e42c __gnu_cxx::__verbose_terminate_handler()
    @          0xd70c8cc __cxxabiv1::__terminate(void (*)())
    @          0xd70c930 std::terminate()
    @          0xd70cac4 __cxa_throw
    @          0xd7a16dc std::__throw_system_error(int)
    @          0xd7a19fc std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())
    @          0x8be4394 starrocks::EvHttpServer::start()
    @          0x81a7a70 starrocks::HttpServiceBE::start()
    @          0x815cb70 starrocks::start_be(std::vector<starrocks::StorePath, std::allocator<starrocks::StorePath> > const&, bool)
    @          0x524ee58 main
    @     0xffff900173fc (/usr/lib/aarch64-linux-gnu/libc.so.6+0x273fb)
    @     0xffff900174cc __libc_start_main
    @          0x524a030 _start

StarRocks version (Required)

3.3.5 RELEASE (build 6d81f75)

kevincai commented 4 days ago

any more information on be.INFO log file?

crabio commented 4 days ago

Sure! be.INFO

I20241111 19:11:26.881520 281472970518592 daemon.cpp:366]  version 3.3.5-6d81f75
BuildType: RELEASE
Build distributor id: ubuntu
Build arch: aarch64
Built on 2024-10-23 11:24:38 by StarRocks@localhost (Ubuntu 22.04.4 LTS)
I20241111 19:11:26.881633 281472970518592 cpu_info.cpp:333] Init docker hardware cores by cgroup's config, cfs_num_cores=1, cpuset_num_cores=4, final num_cores=1
I20241111 19:11:26.882116 281472970518592 mem_info.cpp:153] Init mem info by container's cgroup config, physical_mem=2147483648
I20241111 19:11:26.882155 281472970518592 mem_info.cpp:104] Physical Memory: 2.00 GB
I20241111 19:11:26.882163 281472970518592 daemon.cpp:372] Cpu Info:
  Model: unknown
  Cores: 1
  Max Possible Cores: 4
  L1 Cache: 0 (Line: 64.00 B)
  L2 Cache: 0 (Line: 0)
  L3 Cache: 0 (Line: 0)
  Hardware Supports:
  Numa Nodes: 1
  Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 |
I20241111 19:11:26.882174 281472970518592 daemon.cpp:373] Disk Info: 
  Num disks 5: vda, vda1, vda2, vda3, vda4
I20241111 19:11:26.882176 281472970518592 daemon.cpp:374] Mem Info: 2.00 GB
I20241111 19:11:26.882178 281472970518592 daemon.cpp:375] CPU Info:
  Type: 0
  Family: 0
  Model: 0
  Stepping: 0
  ExtendModel: 0
  ExtendFamily: 0
  RunningInVM: 0
  Vendor: unknown
  Brand: 
  HardwareSupport:
I20241111 19:11:26.882181 281472970518592 daemon.cpp:376] openssl aesni support: 0
I20241111 19:11:26.988848 281472970518592 daemon.cpp:355] Minidump is disabled on non-x86_64 arch
I20241111 19:11:26.988859 281472970518592 starrocks_be.cpp:159] BE start step 1: daemon threads start successfully
I20241111 19:11:26.989016 281472970518592 starrocks_be.cpp:163] BE start step 2: jdbc driver manager init successfully
I20241111 19:11:26.989071 281472970518592 network_util.cpp:128] ipv6 link local address fe80::7c8c:85ff:fea0:fdfe is skipped
I20241111 19:11:26.989073 281472970518592 backend_options.cpp:88] check ip = 127.0.0.1
I20241111 19:11:26.989075 281472970518592 backend_options.cpp:88] check ip = 10.244.0.14
I20241111 19:11:26.989076 281472970518592 backend_options.cpp:118] localhost 10.244.0.14
I20241111 19:11:26.989077 281472970518592 starrocks_be.cpp:169] BE start step 3: backend network options init successfully
I20241111 19:11:26.989121 281472970518592 exec_env.cpp:287] Set storage page cache size 347892350
I20241111 19:11:26.989236 281472845807680 daemon.cpp:199] Current memory statistics: process(0), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0), datacache(0), jit(0)
I20241111 19:11:26.989340 281472970518592 starrocks_be.cpp:174] BE start step 4: global env init successfully
I20241111 19:11:26.990148 281472814481472 data_dir.cpp:133] path: /opt/starrocks/be/storage, hash: 3813974482837957012
I20241111 19:11:27.011508 281472574160960 data_dir.cpp:277] begin loading tablet from meta /opt/starrocks/be/storage
I20241111 19:11:27.011604 281472574160960 data_dir.cpp:335] load tablet from meta finished, loaded tablet: 0, error tablet: 0, path: /opt/starrocks/be/storage duration: 0ms
I20241111 19:11:27.011607 281472574160960 data_dir.cpp:368] begin loading rowset from meta /opt/starrocks/be/storage
I20241111 19:11:27.011610 281472574160960 data_dir.cpp:462] load rowset from meta finished, data dir: /opt/starrocks/be/storage error/total: 0/0 duration: 0ms
W20241111 19:11:27.011892 281472970518592 thread.cpp:279] failed to set thread name: compact_data_di
I20241111 19:11:27.012621 281472970518592 starrocks_be.cpp:177] BE start step 5: storage engine init successfully
I20241111 19:11:27.016408 281472038338624 fragment_mgr.cpp:566] FragmentMgr cancel worker start working.
I20241111 19:11:27.021858 281472970518592 exec_env.cpp:415] [PIPELINE] Exec thread pool: thread_num=1
I20241111 19:11:27.023753 281472970518592 pipeline_executor_set.cpp:129] [WORKGROUP] start executors ([name=com] [num_driver_threads=1] [num_scan_threads=1] [num_connector_scan_threads=8] [cpuids=(0)] [conf=([num_total_cores=1] [num_total_driver_threads=1] [num_total_scan_threads=1] [num_total_connector_scan_threads=8] [enable_bind_cpus=false] [enable_cpu_borrowing=false])])
I20241111 19:11:27.023774 281472970518592 pipeline_executor_set_manager.cpp:116] [WORKGROUP] assign shared executors to workgroup [workgroup=(id:0, name:default_wg, version:0, cpu_weight:1, exclusive_cpu_cores:0, mem_limit:1565515578, concurrency_limit:0, bigquery: (cpu_second_limit:0, mem_limit:0, scan_rows_limit:0), spill_mem_limit_threshold:1)] 
I20241111 19:11:27.023821 281472970518592 pipeline_executor_set_manager.cpp:116] [WORKGROUP] assign shared executors to workgroup [workgroup=(id:1, name:default_mv_wg, version:1, cpu_weight:1, exclusive_cpu_cores:0, mem_limit:1252412462, concurrency_limit:0, bigquery: (cpu_second_limit:0, mem_limit:0, scan_rows_limit:0), spill_mem_limit_threshold:0.8)] 
E20241111 19:11:27.024511 281472970518592 bfd_parser.cpp:107] set default target to elf64-x86-64 failed.
I20241111 19:11:27.181918 281470941528128 runtime_filter_worker.cpp:891] RuntimeFilterWorker start working.
I20241111 19:11:27.182303 281470924619840 profile_report_worker.cpp:111] ProfileReportWorker start working.
I20241111 19:11:27.182373 281472970518592 load_path_mgr.cpp:69] Load path configured to [/opt/starrocks/be/storage/mini_download]
I20241111 19:11:27.182384 281470916165696 result_buffer_mgr.cpp:147] result buffer manager cancel thread begin.
W20241111 19:11:27.186790 281472970518592 jit_engine.cpp:144] System or Process memory limit is less than 16GB, disable JIT. You can set jit_lru_cache_size a properly positive value in BE's config to force enabling JIT
I20241111 19:11:27.186808 281472970518592 starrocks_be.cpp:181] BE start step 6: exec engine init successfully
I20241111 19:11:27.188434 281470497849408 compaction_manager.cpp:69] start compaction scheduler
I20241111 19:11:27.188499 281470430216256 olap_server.cpp:908] begin to do tablet meta checkpoint:/opt/starrocks/be/storage
I20241111 19:11:27.188662 281470489395264 storage_engine.cpp:723] start to check compaction
I20241111 19:11:27.189116 281470396399680 olap_server.cpp:888] try to clear expired replication snapshots!
I20241111 19:11:27.189231 281472970518592 olap_server.cpp:259] All backgroud threads of storage engine have started.
I20241111 19:11:27.189235 281472970518592 starrocks_be.cpp:186] BE start step 7: storage engine start bg threads successfully
I20241111 19:11:27.189756 281470404853824 olap_server.cpp:830] try to perform path gc by tablet!
I20241111 19:11:27.189761 281470404853824 olap_server.cpp:833] try to perform path gc by rowsetid!
I20241111 19:11:27.189762 281470404853824 olap_server.cpp:837] try to perform path gc by dcg files!
I20241111 19:11:27.193928 281472970518592 starlet_server.cc:77] Starlet grpc server started on 0.0.0.0:9070
I20241111 19:11:27.194046 281472970518592 starrocks_be.cpp:190] BE start step 8: staros worker init successfully
I20241111 19:11:27.194119 281470303404096 starlet.cc:103] Empty starmanager address, skip reporting!
W20241111 19:11:27.194129 281472970518592 cache_options.cpp:126] fail to clean residual datacache data, reason: /opt/starrocks/be/datacache: No such file or directory
I20241111 19:11:27.194305 281472970518592 block_cache.cpp:65] init starcache engine, block_size: 262144
I20241111 19:11:27.194691 281472970518592 star_cache_impl.cpp:137] init starcache success. block_size: 262144, disk checksum: 0, mem_quota: 0, disk_quota: 0, scheduler threads: 0
I20241111 19:11:27.194802 281472970518592 starrocks_be.cpp:198] BE start step 9: datacache init successfully
I20241111 19:11:27.194814 281472970518592 backend_base.cpp:78] StarRocksInternalService has started listening port on 9060
I20241111 19:11:27.195209 281472970518592 thrift_server.cpp:384] BackendService has started listening port on 9060
I20241111 19:11:27.195212 281472970518592 starrocks_be.cpp:221] BE start step 10: start thrift server successfully
I20241111 19:11:27.199044 281472970518592 starrocks_be.cpp:260] BRPC server bind to host: 0.0.0.0, port: 8060
I20241111 19:11:27.204615 281472970518592 server.cpp:1181] Server[starrocks::LakeServiceImpl+starrocks::BackendInternalServiceImpl<starrocks::PInternalService>+starrocks::BackendInternalServiceImpl<doris::PBackendService>] is serving on port=8060.
I20241111 19:11:27.204632 281472970518592 server.cpp:1184] Check out http://kube-starrocks-be-0:8060 in web browser.
I20241111 19:11:27.205367 281472970518592 starrocks_be.cpp:266] BE start step 11: start brpc server successfully
W20241111 19:11:27.284465 281472970518592 stack_util.cpp:347] 2024-11-11 19:11:27.213380, query_id=00000000-0000-0000-0000-000000000000, fragment_instance_id=00000000-0000-0000-0000-000000000000 throws exception: std::system_error, trace:
     @          0x5777484  __wrap___cxa_throw
    @          0xd7a16dc  std::__throw_system_error(int)
    @          0xd7a19fc  std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())
    @          0x8be4394  starrocks::EvHttpServer::start()
    @          0x81a7a70  starrocks::HttpServiceBE::start()
    @          0x815cb70  starrocks::start_be(std::vector<starrocks::StorePath, std::allocator<starrocks::StorePath> > const&, bool)
    @          0x524ee58  main
    @     0xffff889573fc  (/usr/lib/aarch64-linux-gnu/libc.so.6+0x273fb)
    @     0xffff889574cc  __libc_start_main
    @          0x524a030  _start

I20241111 19:11:27.289675 281472970518592 logconfig.cpp:135] je_mallctl execute purge success
I20241111 19:11:27.289726 281472970518592 logconfig.cpp:143] je_mallctl execute dontdump success
kevincai commented 3 days ago
Exceptions
3) [std::system_error](https://en.cppreference.com/w/cpp/error/system_error) if the thread could not be started. The exception may represent the error condition std::errc::resource_unavailable_try_again or another implementation-specific error condition.

Can you take a look at the ulimit settings for the container?

crabio commented 2 days ago

@kevincai star rocks BE has no ulimit util inside:

f": OCI runtime exec failed: exec failed: unable to start container process: exec: "ulimit": executable file not found in $PATH: unknown

BTW. You said that I have an error resource_unavailable_try_again, but it wasn't in my logs. Where did you find it?

And I'm running Star Rocks in local Kubernetes with Podman Kind. Do it has ulimits? Or are you talking about cpu/memory resources?