apache / helix

Mirror of Apache Helix
Apache License 2.0
457 stars 218 forks source link

Flaky test testUpdateConfigFields #1336

Closed huizhilu closed 3 years ago

huizhilu commented 3 years ago
[ERROR] Tests run: 166, Failures: 1, Errors: 0, Skipped: 25, Time elapsed: 52.12 s <<< FAILURE! - in TestSuite
3249
[ERROR] testUpdateConfigFields(org.apache.helix.rest.server.TestClusterAccessor)  Time elapsed: 0.015 s  <<< FAILURE!
3250
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
3251
    at org.apache.helix.rest.server.TestClusterAccessor.testUpdateConfigFields(TestClusterAccessor.java:167)

[INFO] Results:

[ERROR] Failures: 
3257
[ERROR]   TestClusterAccessor.testUpdateConfigFields:167 » IndexOutOfBounds Index: 0, Si...

[ERROR] Tests run: 166, Failures: 1, Errors: 0, Skipped: 25

[INFO] ------------------------------------------------------------------------
3262
[INFO] BUILD FAILURE
3263
[INFO] ------------------------------------------------------------------------
huizhilu commented 3 years ago

@kaisun2000 One more flaky test run on GitHub :) If you have time, feel free to tackle it.

kaisun2000 commented 3 years ago

LOG 2742

2020-10-08T00:48:47.6150751Z new config:ZnRecord=TestCluster_0, { FAULT_ZONE_TYPE=Host, PERSIST_BEST_POSSIBLE_ASSIGNMENT=true, SimpleField1=Value1, SimpleField2=Value2, TOPOLOGY=/Rack/Sub-Rack/Host/Instance, TOPOLOGY_AWARE_ENABLED=true, newField=newValue } { MapField1={key1=value1, key2=value2}, MapField2={key3=value1, key4=value2}, newMap={newkey1=newvalue1, newkey2=newvalue2} } { INSTANCE_CAPACITY_KEYS=[], ListField1=[Value1, Value2, Value3], ListField2=[Value2, Value1, Value3], newList=[newValue1, newValue2] }, Stat=Stat {_version=0, _creationTime=0, _modifiedTime=0, _ephemeralOwner=0} 2020-10-08T00:48:47.6152886Z End test :testAddConfigFields 2020-10-08T00:48:47.6153403Z Start test :testUpdateConfigFields 2020-10-08T00:48:47.6185510Z new config:ZnRecord=TestCluster_0, {FAULT_ZONE_TYPE=Host, PERSIST_BEST_POSSIBLE_ASSIGNMENT=true, SimpleField1=Value1, SimpleField2=Value2, TOPOLOGY=/Rack/Sub-Rack/Host/Instance, TOPOLOGY_AWARE_ENABLED=true, newField=newValue}{MapField1={key1=value1, key2=value2}, MapField2={key3=value1, key4=value2}, newMap={newkey1=newvalue1, newkey2=newvalue2}}{INSTANCE_CAPACITY_KEYS=[], ListField1=[Value1, Value2, Value3], ListField2=[Value2, Value1, Value3], newList=[newValue1, newValue2]}, Stat=Stat {_version=0, _creationTime=0, _modifiedTime=0, _ephemeralOwner=0}

kaisun2000 commented 3 years ago

This is from debugger

Start test :testUpdateConfigFields new config:ZnRecord=TestCluster_0, { FAULT_ZONE_TYPE=helixZoneId, PERSIST_BEST_POSSIBLE_ASSIGNMENT=true, SimpleField1=Value1, SimpleField2=Value2, newField=newValue } { MapField1={key1=value1, key2=value2}, MapField2={key3=value1, key4=value2}, newMap={newkey1=newvalue1, newkey2=newvalue2} } { ListField1=[Value1, Value2, Value3], ListField2=[Value2, Value1, Value3], newList=[newValue1, newValue2] }, Stat=Stat {_version=0, _creationTime=0, _modifiedTime=0, _ephemeralOwner=0}

kaisun2000 commented 3 years ago

Need to trace the origin of INSTANCE_CAPACITY_KEYS in github. Is it related to cloud config?

kaisun2000 commented 3 years ago

The code path from static analysis seems to be:

computeBestPossibleAssignment --> globalRebalance --> ResourceChangeDetector.updateSnapshots --> ResourceChangeSnapshot(ResourceControllerDataProvider dataProvider, boolean ignoreNonTopologyChange) --> ClusterConfigTrimmer.getInstance().trimProperty(dataProvider.getClusterConfig()) --> HelixPropertyTrimmer.doTrim --> copyNonTrimmableInfo(originalZNRecord, trimmedZNRecord, getNonTrimmableFields(originalProperty), false) --> protected Map<FieldType, Set> getNonTrimmableFields(ClusterConfig property) { return STATIC_TOPOLOGY_RELATED_FIELD_MAP; } -->

private static final Map<FieldType, Set> STATIC_TOPOLOGY_RELATED_FIELD_MAP = ImmutableMap .of(FieldType.SIMPLE_FIELD, ImmutableSet .of(ClusterConfigProperty.TOPOLOGY.name(), ClusterConfigProperty.FAULT_ZONE_TYPE.name(), ClusterConfigProperty.TOPOLOGY_AWARE_ENABLED.name(), ClusterConfigProperty.MAX_PARTITIONS_PER_INSTANCE.name()), FieldType.LIST_FIELD, ImmutableSet .of(ClusterConfigProperty.INSTANCE_CAPACITY_KEYS.name()), FieldType.MAP_FIELD, ImmutableSet .of(ClusterConfigProperty.DEFAULT_INSTANCE_CAPACITY_MAP.name(), ClusterConfigProperty.DEFAULT_PARTITION_WEIGHT_MAP.name(), ClusterConfigProperty.REBALANCE_PREFERENCE.name()));

Note

private void copyNonTrimmableInfo(ZNRecord originalZNRecord, ZNRecord trimmedZNRecord,
      Map<FieldType, Set<String>> nonTrimmableFields, boolean trimValue) {
    for (Map.Entry<FieldType, Set<String>> fieldEntry : nonTrimmableFields.entrySet()) {
      FieldType fieldType = fieldEntry.getKey();
      Set<String> fieldKeySet = fieldEntry.getValue();
      if (null == fieldKeySet || fieldKeySet.isEmpty()) {
        continue;
      }
      switch (fieldType) {
        case SIMPLE_FIELD:
          fieldKeySet.stream().forEach(fieldKey -> {
            if (originalZNRecord.getSimpleFields().containsKey(fieldKey)) {
              trimmedZNRecord.getSimpleFields().putIfAbsent(fieldKey,
                  trimValue ? null : originalZNRecord.getSimpleField(fieldKey));
            }
          });
          break;
        case LIST_FIELD:
          fieldKeySet.stream().forEach(fieldKey -> {
            if (originalZNRecord.getListFields().containsKey(fieldKey)) {
              trimmedZNRecord.getListFields().putIfAbsent(fieldKey,
                  trimValue ? **Collections.EMPTY_LIST** : originalZNRecord.getListField(fieldKey));
            }
          });
          break;

This would add an empty INSTANCE_CAPACITY_KEYS=[]

kaisun2000 commented 3 years ago

All in all, once waged rebalancer wrote global base line, it may add this list field. INSTANCE_CAPACITY_KEYS. This seems to be recent change from @jiajunwang?

xyuanlu commented 3 years ago

The test is constantly failing on master now. The direct reason is the same, there is an empty INSTANCE_CAPACITY_KEYS=[], list so we get a IndexOutOfBoundsException error.

My theory is that, this empty list could be added by TestInstancesAccessor.testValidateWeightForAllInstances() or TestPerInstanceAccessor.testValidateWeightForInstance() since these two tests and TestClusterAccessor.testUpdateConfigFields() uses the same cluster "TestCluster_0".

In the test log where testUpdateConfigFields is failing, the two tests that added this empty list always happen before this testUpdateConfigFields.

My solution is simply changing the cluster for testUpdateConfigFields to another test cluster and use "TestCluster_0" only for WAGED related tests.

kaisun2000 commented 3 years ago

  @Test(dependsOnMethods = "checkUpdateFails")
  public void testValidateWeightForInstance()
      throws IOException {
    System.out.println("Start test :" + TestHelper.getTestMethodName());
    // Empty out ClusterConfig's weight key setting and InstanceConfig's capacity maps for testing
    ClusterConfig clusterConfig = _configAccessor.getClusterConfig(CLUSTER_NAME);
    clusterConfig.getRecord()
        .setListField(ClusterConfig.ClusterConfigProperty.INSTANCE_CAPACITY_KEYS.name(),
            new ArrayList<>());
    _configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
jiajunwang commented 3 years ago

Close test unstable tickets since we have an automatic tracking mechanism #1757 now for tracking the most recent test issues.