hugegraph-server0.12.0集群模式部署成功，导入数据重启之后hugegraph-server启动不起来

huangds3527 commented 1 year ago

Bug Type (问题类型)

other exception / error (其他异常报错)

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 0.12.0 (Apache Release Version)
Backend: RocksDB 3 nodes, HDD or SSD
Data Size: 8000 vertices, 5000 edges

Expected & Actual behavior (期望与实际表现)

hugegraph-server0.12.0集群模式部署成功，导入数据重启之后hugegraph-server启动不起来，Waiting for raft group 'default' election timeout

2023-05-19 16:21:10 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 270.084s
2023-05-19 16:21:13 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 273.085s
2023-05-19 16:21:16 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 276.085s
2023-05-19 16:21:19 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 279.086s
2023-05-19 16:21:22 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 282.086s
2023-05-19 16:21:25 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 285.087s
2023-05-19 16:21:28 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 288.087s
2023-05-19 16:21:31 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 291.088s
2023-05-19 16:21:34 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 294.088s
2023-05-19 16:21:37 [main] [WARN] c.b.h.b.s.r.RaftNode - Waiting for raft group 'default' election cost 297.088s
2023-05-19 16:21:39 [Bolt-default-executor-5-thread-16] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:39 [Bolt-default-executor-5-thread-17] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:40 [Rpc-netty-server-worker-1-thread-3] [WARN] c.a.s.j.r.i.BoltRaftRpcFactory - JRaft SET bolt.rpc.dispatch-msg-list-in-default-executor to be false for replicator pipeline optimistic.
2023-05-19 16:21:40 [default/PeerPair[192.168.30.31:8282 -> 192.168.30.32:8282]-AppendEntriesThread0] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:40 [main] [INFO] o.a.t.g.s.GremlinServer - Shutting down OpProcessor[]
2023-05-19 16:21:40 [main] [INFO] o.a.t.g.s.GremlinServer - Shutting down OpProcessor[session]
2023-05-19 16:21:40 [main] [INFO] o.a.t.g.s.GremlinServer - Shutting down OpProcessor[traversal]
2023-05-19 16:21:40 [main] [INFO] o.a.t.g.s.GremlinServer - Shutting down thread pools.
2023-05-19 16:21:40 [main] [ERROR] c.b.h.d.HugeGraphServer - HugeRestServer start error:
com.baidu.hugegraph.backend.BackendException: Waiting for raft group 'default' election timeout(300089ms)
        at com.baidu.hugegraph.backend.store.raft.RaftNode.waitLeaderElected(RaftNode.java:188) ~[hugegraph-core-0.12.0.jar:0.12.0.0]
        at com.baidu.hugegraph.backend.store.raft.RaftSharedContext.waitRaftNodeStarted(RaftSharedContext.java:145) ~[hugegraph-core-0.12.0.jar:0.12.0.0]
        at com.baidu.hugegraph.backend.store.raft.RaftBackendStoreProvider.waitStoreStarted(RaftBackendStoreProvider.java:145) ~[hugegraph-core-0.12.0.jar:0.12.0.0]
        at com.baidu.hugegraph.StandardHugeGraph.waitStarted(StandardHugeGraph.java:316) ~[hugegraph-core-0.12.0.jar:0.12.0.0]
        at com.baidu.hugegraph.core.GraphManager.lambda$waitGraphsStarted$0(GraphManager.java:124) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) ~[?:1.8.0_171]
        at com.baidu.hugegraph.core.GraphManager.waitGraphsStarted(GraphManager.java:122) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at com.baidu.hugegraph.core.GraphManager.<init>(GraphManager.java:101) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at com.baidu.hugegraph.server.ApplicationConfig$GraphManagerFactory$1.onEvent(ApplicationConfig.java:112) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at org.glassfish.jersey.server.internal.monitoring.CompositeApplicationEventListener.onEvent(CompositeApplicationEventListener.java:74) ~[jersey-server-2.25.1.jar:?]
        at org.glassfish.jersey.server.internal.monitoring.MonitoringContainerListener.onStartup(MonitoringContainerListener.java:81) ~[jersey-server-2.25.1.jar:?]
        at org.glassfish.jersey.server.ApplicationHandler.onStartup(ApplicationHandler.java:1180) ~[jersey-server-2.25.1.jar:?]
        at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.start(GrizzlyHttpContainer.java:357) ~[jersey-container-grizzly2-http-2.25.1.jar:?]
        at org.glassfish.grizzly.http.server.HttpHandlerChain.start(HttpHandlerChain.java:398) ~[grizzly-http-server-2.4.4.jar:2.4.4]
        at org.glassfish.grizzly.http.server.HttpServer.setupHttpHandler(HttpServer.java:293) ~[grizzly-http-server-2.4.4.jar:2.4.4]
        at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:269) ~[grizzly-http-server-2.4.4.jar:2.4.4]
        at com.baidu.hugegraph.server.RestServer.start(RestServer.java:68) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at com.baidu.hugegraph.server.RestServer.start(RestServer.java:175) ~[hugegraph-api-0.12.0.jar:0.67.0.0]
        at com.baidu.hugegraph.dist.HugeRestServer.start(HugeRestServer.java:34) ~[hugegraph-dist-0.12.0.jar:?]
        at com.baidu.hugegraph.dist.HugeGraphServer.<init>(HugeGraphServer.java:71) ~[hugegraph-dist-0.12.0.jar:?]
        at com.baidu.hugegraph.dist.HugeGraphServer.main(HugeGraphServer.java:119) ~[hugegraph-dist-0.12.0.jar:?]
2023-05-19 16:21:40 [gremlin-server-stop] [INFO] o.a.t.g.s.GremlinServer - Executing shutdown LifeCycleHook
2023-05-19 16:21:40 [gremlin-server-stop] [INFO] o.a.t.g.s.GremlinServer - Executed once at shutdown of Gremlin Server.
2023-05-19 16:21:41 [default/PeerPair[192.168.30.31:8282 -> 192.168.30.32:8282]-AppendEntriesThread0] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:41 [default/PeerPair[192.168.30.31:8282 -> 192.168.30.32:8282]-AppendEntriesThread0] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:42 [default/PeerPair[192.168.30.31:8282 -> 192.168.30.32:8282]-AppendEntriesThread0] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:42 [default/PeerPair[192.168.30.31:8282 -> 192.168.30.32:8282]-AppendEntriesThread0] [WARN] c.a.s.j.c.NodeImpl - Node <default/192.168.30.31:8282> is not in active state, currTerm=11.
2023-05-19 16:21:42 [gremlin-server-stop] [INFO] c.b.h.HugeGraph - Close graph standardhugegraph[hugegraph]
2023-05-19 16:21:42 [gremlin-server-stop] [INFO] c.b.h.b.s.r.RaftSharedContext - Stop raft server: 192.168.30.31:8282
2023-05-19 16:21:42 [gremlin-server-stop] [INFO] c.b.h.b.s.r.RaftNode - Shutdown raft node: [default-192.168.30.31:8282]
2023-05-19 16:21:42 [gremlin-server-stop] [INFO] o.a.t.g.s.GremlinServer - Closed Graph instance [hugegraph]
2023-05-19 16:21:42 [gremlin-server-stop] [INFO] o.a.t.g.s.GremlinServer - Gremlin Server - shutdown complete
2023-05-19 16:21:45 [SOFA-RPC-ShutdownHook] [WARN] c.a.s.r.c.RpcRuntimeContext - SOFA RPC Framework catch JVM shutdown event, Run shutdown hook now.
2023-05-19 16:21:45 [hugegraph-shutdown] [INFO] c.b.h.HugeGraph - HugeGraph is shutting down

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

imbajin commented 1 year ago

have u tried v1.0.0？

huangds3527 commented 1 year ago

Can you provide a v1.0.0 cluster deployment method，issues only has v0.12.0 cluster deployment method

imbajin commented 1 year ago

Can you provide a v1.0.0 cluster deployment method，issues only has v0.12.0 cluster deployment method

Same like it, and I do remember someone ask the question?

huangds3527 commented 1 year ago

I attempted to deploy v1.0.0 using the same method, but failed with the following error: v0.12.0 was successfully deployed

imbajin commented 1 year ago

I attempted to deploy v1.0.0 using the same method, but failed with the following error: v0.12.0 was successfully deployed

@javeme any suggestion about the latest cluster config?

huangds3527 commented 1 year ago

deploy v1.0.0 failed,v0.12.0 was successfully deployed master rest-server.properties // bind url restserver.url=http://0.0.0.0:8080 // gremlin server url, need to be consistent with host and port in gremlin-server.yaml //gremlinserver.url=http://127.0.0.1:8182

graphs=./conf/graphs

// The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0 batch.max_write_ratio=80 batch.max_write_threads=0

// authentication configs // choose 'com.baidu.hugegraph.auth.StandardAuthenticator' or 'com.baidu.hugegraph.auth.ConfigAuthenticator' //auth.authenticator=

// for StandardAuthenticator mode //auth.graph_store=hugegraph // auth client config //auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897

// for ConfigAuthenticator mode //auth.admin_token= //auth.user_tokens=[]

// rpc group configs of multi graph servers // rpc server configs rpc.server_host=192.168.30.31 rpc.server_port=8090 //rpc.server_timeout=30

// rpc client configs (like enable to keep cache consistency) rpc.remote_url=192.168.30.31,192.168.30.32,192.168.30.33 //rpc.remote_url=127.0.0.1:8090 //rpc.client_connect_timeout=20 //rpc.client_reconnect_period=10 //rpc.client_read_timeout=40 //rpc.client_retries=3 //rpc.client_load_balancer=consistentHash

// lightweight load balancing (beta) server.id=server-1 server.role=master

hugegraph.properties

// gremlin entrance to create graph // auth config: com.baidu.hugegraph.auth.HugeFactoryAuthProxy gremlin.graph=com.baidu.hugegraph.HugeFactory

// cache config //schema.cache_capacity=100000 // vertex-cache default is 1000w, 10min expired vertex.cache_type=l2 //vertex.cache_capacity=10000000 //vertex.cache_expire=600 // edge-cache default is 100w, 10min expired edge.cache_type=l2 //edge.cache_capacity=1000000 //edge.cache_expire=600

// schema illegal name template //schema.illegal_name_regex=\s+|~.*

//vertex.default_label=vertex

backend=rocksdb serializer=binary

store=hugegraph

raft.mode=true raft.safe_read=true raft.use_snapshot=false raft.endpoint=192.168.30.31:8282 raft.group_peers=192.168.30.31:8282,192.168.30.32:8282,192.168.30.33:8282 raft.path=./raft-log raft.use_replicator_pipeline=true raft.election_timeout=10000 raft.snapshot_interval=3600 raft.backend_threads=48 raft.read_index_threads=8 raft.read_strategy=ReadOnlyLeaseBased raft.queue_size=16384 raft.queue_publish_timeout=60 raft.apply_batch=1 raft.rpc_threads=80 raft.rpc_connect_timeout=5000 raft.rpc_timeout=60000

search.text_analyzer=jieba search.text_analyzer_mode=INDEX

wuchaojing commented 1 year ago

在v1.0.0版本，raft.group_peers 配置已经从 hugegraph.properties 移到了 rest-server.properties 中。可以下载v1.0.0源码，看下给出的配置demo

apache / incubator-hugegraph