apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[HUDI-8500] Add line wrapping indentation check #12230

Closed usberkeley closed 6 days ago

usberkeley commented 1 week ago

Change Logs

Add line wrapping indentation check: forceStrictCondition=true

Explanation about CheckStyle Option forceStrictCondition:

Force strict indent level in line wrapping case.
 If value is true, line wrap indent have to be same as lineWrappingIndentation parameter. 
 If value is false, line wrap indent could be bigger on any value user would like.

Impact

Help identify issues with inconsistent line wrapping and indentation, such as:

1. Function Parameter Line Wrapping

// `Context context` is indented by 6 characters
public StreamWriteOperatorCoordinator(
      Configuration conf,
         Context context) {
    this.conf = conf;
    this.context = context;
    this.parallelism = context.currentParallelism();
    this.storageConf = HadoopFSUtils.getStorageConfWithCopy(HadoopConfigurations.getHiveConf(conf));
  }
  // `int numEntries` is aligned with `(`
  private static void verifyHFileReadCompatibility(String filename,
                                                   int numEntries,
                                                   Option<Function<Integer, String>> keyCreator) throws IOException {
    try (HFileReader reader = getHFileReader(filename)) {
      reader.initializeMetadata();
      verifyHFileMetadataCompatibility(reader, numEntries);
      verifyHFileValuesInSequentialReads(reader, numEntries, keyCreator);
    }
  }

2. Inconsistent Line Wrapping and Indentation

validatePathInfoList(
        Arrays.stream(new StoragePathInfo[] {
            getStoragePathInfo("x/1.file", false),
            getStoragePathInfo("x/2.file", false),
            getStoragePathInfo("x/y", true),
            getStoragePathInfo("x/z", true)
        }).collect(Collectors.toList()),
        storage.listDirectEntries(new StoragePath(getTempDir(), "x")));

Risk level (write none, low medium or high below)

high

Documentation Update

none

Contributor's checklist

usberkeley commented 1 week ago

@danny0405 Due to inconsistent indentation in the previous code, this modification will be extensive. Do you think this would be okay? If so, I will proceed with modifying the source code files

hudi-bot commented 1 week ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build
danny0405 commented 1 week ago

I kind of feel it is too strict for developers.

usberkeley commented 1 week ago

I kind of feel it is too strict for developers.

Understood, I'm also a bit unsure about whether this restriction is necessary.

However, the Flink module fellow to this standard, using a 4-character indentation for line wrapping. Could you share more suggestion on this?

yihua commented 6 days ago

How many violations are there in the codebase? We can clean obvious violations (e.g., not aligned due to refactoring) up module by module or package by package. However, the strict checkstyle rule may not be necessary.