Closed csudharsanan closed 2 months ago
Fixed the mbean issue in HelixTask. Now it supports -> transitions. Since this wasn't failing tests, adding some logs.
Before:
Start zookeeper at localhost:2183 in thread main
START TestSetPartitionsToErrorState_testSetPartitionsToErrorState at Tue May 07 12:28:02 PDT 2024
true: wait 332ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
javax.management.RuntimeOperationsException
at java.management/com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:298)
at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1848)
at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:945)
at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:880)
at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:315)
at java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:523)
at org.apache.helix.monitoring.mbeans.MBeanRegistrar.register(MBeanRegistrar.java:60)
at org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMBeanProvider.doRegister(DynamicMBeanProvider.java:89)
at org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMBeanProvider.doRegister(DynamicMBeanProvider.java:95)
at org.apache.helix.monitoring.mbeans.StateTransitionStatMonitor.register(StateTransitionStatMonitor.java:83)
at org.apache.helix.monitoring.mbeans.ParticipantStatusMonitor.reportTransitionStat(ParticipantStatusMonitor.java:113)
at org.apache.helix.messaging.handling.HelixTask.reportMessageStat(HelixTask.java:335)
at org.apache.helix.messaging.handling.HelixTask.finalCleanup(HelixTask.java:386)
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:185)
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalArgumentException: Repository: cannot add mbean for pattern name CLMParticipantReport:Cluster=TestSetPartitionsToErrorState_testSetPartitionsToErrorState,Transition=*--ERROR
... 19 more
true: wait 233ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
true: wait 216ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
16468 [ZkClient-EventThread-162-localhost:2183] ERROR org.apache.helix.messaging.handling.HelixTaskExecutor [] - Message xyz cannot be processed: ***, {CREATE_TIMESTAMP=1715110092791, FROM_STATE=*, MSG_ID=***, MSG_STATE=new, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=TestDB0_7, RESOURCE_NAME=TestDB0, SRC_NAME=*****, STATE_MODEL_DEF=MasterSlave, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=localhost_12918, TGT_SESSION_ID=***, TO_STATE=ERROR}{}{}Partition TestDB0_7 current state is same as toState (*->ERROR) from message.
true: wait 53ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
END TestSetPartitionsToErrorState_testSetPartitionsToErrorState at Tue May 07 12:28:15 PDT 2024
After:
Start zookeeper at localhost:2183 in thread main
START TestSetPartitionsToErrorState_testSetPartitionsToErrorState at Tue May 07 12:23:24 PDT 2024
true: wait 302ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
true: wait 202ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
true: wait 185ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
16489 [ZkClient-EventThread-162-localhost:2183] ERROR org.apache.helix.messaging.handling.HelixTaskExecutor [] - Message xyz cannot be processed: ***, {CREATE_TIMESTAMP=1715109814097, FROM_STATE=*, MSG_ID=***, MSG_STATE=new, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=TestDB0_7, RESOURCE_NAME=TestDB0, SRC_NAME=*****, STATE_MODEL_DEF=MasterSlave, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=localhost_12918, TGT_SESSION_ID=***, TO_STATE=ERROR}{}{}Partition TestDB0_7 current state is same as toState (*->ERROR) from message.
true: wait 51ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestSetPartitionsToErrorState_testSetPartitionsToErrorState@localhost:2183)
END TestSetPartitionsToErrorState_testSetPartitionsToErrorState at Tue May 07 12:23:36 PDT 2024
AfterClass: TestSetPartitionsToErrorState called.
Shut down zookeeper at port 2183 in thread main
This PR is ready to be merged. This PR adds SetPartitionToError endpoint for participants to self annotate a node to ERROR state
Issues
Fixes #2791
Description
What: An API endpoint that validates the incoming request and sends a state transition message to sets one or more partitions from any current state to ERROR state.
Why: Currently, the participants are unable to set a partition to an ERROR state explicitly when they seem to be stuck in a specific current state. The only way a replica can be set to ERROR is from within a state model. Having an endpoint to allow this behavior would allow the clients to call the resetPartition endpoint to set it back to INIT state and recover the replica. resetPartition works only on partitions in error state.
Tests
[ ] The following tests are written/updated for this issue:
mvn test -o -Dtest=TestSetPartitionToErrorState -pl=helix-core
mvn test -o -Dtest=TestZkHelixAdmin -pl=helix-core
mvn test -o -Dtest=TestPerInstanceAccessor -pl=helix-rest
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality