apache / incubator-hugegraph-computer

HugeGraph Computer - A distributed graph processing system for hugegraph (OLAP)
https://hugegraph.apache.org/docs/quickstart/hugegraph-computer/
Apache License 2.0
42 stars 41 forks source link

feat: implement fast-failover for MessageRecvManager and DataClientManager #243

Closed Radeity closed 1 year ago

Radeity commented 1 year ago

Purpose of the PR

Main Changes

This PR implements method exceptionCaught of MessageRecvManager and DataClientManager in different ways.

I find it's the most light weight implementation, IN FACT, it can be seen as a lazy-fail, rather than block as before.

Verifying these changes

Does this PR potentially affect the following parts?

Documentation Status

Radeity commented 1 year ago

Hi, @coderzc, @imbajin, would you like to help review this PR when available : )

codecov[bot] commented 1 year ago

Codecov Report

Merging #243 (26aca44) into master (cc5a7e7) will increase coverage by 0.02%. The diff coverage is 60.86%.

@@             Coverage Diff              @@
##             master     #243      +/-   ##
============================================
+ Coverage     85.83%   85.85%   +0.02%     
  Complexity     3232     3232              
============================================
  Files           344      344              
  Lines         12072    12076       +4     
  Branches       1087     1088       +1     
============================================
+ Hits          10362    10368       +6     
+ Misses         1185     1182       -3     
- Partials        525      526       +1     
Impacted Files Coverage Δ
...graph/computer/core/network/DataClientManager.java 85.36% <0.00%> (ø)
...re/network/netty/ChannelFutureListenerOnWrite.java 80.00% <ø> (ø)
...graph/computer/core/sender/MessageSendManager.java 78.40% <0.00%> (ø)
...raph/computer/core/sender/QueuedMessageSender.java 73.68% <0.00%> (-2.29%) :arrow_down:
...aph/computer/core/receiver/MessageRecvManager.java 84.88% <82.35%> (+3.48%) :arrow_up:

... and 3 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Radeity commented 1 year ago

Hi, @javeme , thanks for your review again, i've made some changes based on your comments, PTAL : )

AND

please also update the exception message at line 206

I've reverted this part to only catch InterruptedException with original log message.

Radeity commented 1 year ago

Hi, all, really thanks for @coderzc's suggestion, I've modified the implementation in a more elegant way. Like describe above in PR description:

Both set an exceptional future. The main thread will sense this exception later when sending(sender) and waiting(receiver) controlling message.

PTAL : )

Radeity commented 1 year ago

Maybe we should find finishMessageFuture through connectionId?

Hi, @coderzc , during the process of sending vertices and edges, two control message sent from the same worker share the same ConnectionId. How about add a boolean variable sendControlMessage in finishSend, like:

public void finishSend(MessageType type, boolean sendControlMessage) { 
     Map<Integer, MessageSendPartition> all = this.buffers.all(); 
     MessageStat stat = this.sortAndSendLastBuffer(all, type); 

     Set<Integer> workerIds = all.keySet().stream() 
                                 .map(this.partitioner::workerId) 
                                 .collect(Collectors.toSet()); 
     if (sendControlMessage == true) {
         this.sendControlMessageToWorkers(workerIds, MessageType.FINISH); 
     }
     LOG.info("Finish sending message(type={},count={},bytes={})", 
              type, stat.messageCount(), stat.messageBytes()); 
 } 

or we separate the call of sendControlMessageToWorkers and finishSend, only when finish sending edges, we call sendControlMessageToWorkers. In this way, we can use connectionId to find a future, another advantage is for both inputStep and superStep, we can maintain a future array with same length, cuz expectedFinishMessages keeps the same, WDYT?

coderzc commented 1 year ago

Maybe we should find finishMessageFuture through connectionId?

Hi, @coderzc , during the process of sending vertices and edges, two control message sent from the same worker share the same ConnectionId. How about add a boolean variable sendControlMessage in finishSend, like:

public void finishSend(MessageType type, boolean sendControlMessage) { 
     Map<Integer, MessageSendPartition> all = this.buffers.all(); 
     MessageStat stat = this.sortAndSendLastBuffer(all, type); 

     Set<Integer> workerIds = all.keySet().stream() 
                                 .map(this.partitioner::workerId) 
                                 .collect(Collectors.toSet()); 
     if (sendControlMessage == true) {
         this.sendControlMessageToWorkers(workerIds, MessageType.FINISH); 
     }
     LOG.info("Finish sending message(type={},count={},bytes={})", 
              type, stat.messageCount(), stat.messageBytes()); 
 } 

or we separate the call of sendControlMessageToWorkers and finishSend, only when finish sending edges, we call sendControlMessageToWorkers. In this way, we can use connectionId to find a future, another advantage is for both inputStep and superStep, we can maintain a future array with same length, cuz expectedFinishMessages keeps the same, WDYT?

@Radeity Do you mean to merge send vertex and send edge into the same session?

Radeity commented 1 year ago

@Radeity Do you mean to merge send vertex and send edge into the same session?

Yes, can we do that?

Radeity commented 1 year ago

@coderzc it seems that a lot to change, maybe we should keep using finishMessageCount, WDYT?

coderzc commented 1 year ago

@coderzc it seems that a lot to change, maybe we should keep using finishMessageCount, WDYT?

I agree with this, then only add a future and complete this future when finishMessageCount==0

coderzc commented 1 year ago

@Radeity Please fix the code style, you can import https://github.com/apache/incubator-hugegraph/blob/master/hugegraph-style.xml to IDEA

imbajin commented 1 year ago

@Radeity Please fix the code style, you can import apache/incubator-hugegraph@master/hugegraph-style.xml to IDEA

Good reminder 🤔 I'll add a full doc for devs to use IDEA effectively in Wiki first (then move to website/doc)

Update: put the idea-config-doc here first

Also, spotless could add in the TODO Task and assign to someone to test it

Radeity commented 1 year ago

Good reminder 🤔 I'll add a full doc for devs to use IDEA effectively in Wiki first (then move to website/doc)

I use IDEA for remote development, which really annoy me that I can not import files. May I ask do we have other ways to make this format script effective? BTW, maybe we can consider to use mvn spotless in the future.