apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
960 stars 302 forks source link

[Subtask] Fix duplicate-code Pylint rule for client-python #3767

Open noidname01 opened 4 months ago

noidname01 commented 4 months ago

Describe the subtask

Parent issue

3560

ian910297 commented 3 months ago

I’m interested in working on this issue.

I ran an initial pylint test. It looks like there are 20 issues to fix. I will submit them in batches.

noidname01 commented 3 months ago

Hi @ian910297, thanks for your interest in this issue. If you have any question, feel free to comment and let us know👍

ian910297 commented 3 months ago

@noidname01 I've outlined some scenarios, but I'd like to discuss them before writing the code.

  1. exception Exception declarations are scattered across multiple files. I want to consolidate them into a single file, base.py, located at clients/clitent-python/gravitino/exceptions/base.py. There are also identically named exception classes in gravitino.client.gravitino_client and gravitino.client.gravitino_metalake, which I will also include in base.py.

reference: [SPARK-42342][PYTHON][CONNECT] Introduce base hierarchy to exceptions #39882

  1. Unit and integration tests are allowed to have duplicate code in specific files.

    audit_dto = AuditDTO(
            _creator="test",
            _create_time="2022-01-01T00:00:00Z",
            _last_modifier="test",
            _last_modified_time="2024-04-05T10:10:35.218Z",
        )
  2. no idea: how to refactor the getter function of private variables This code doesn't look very Pythonic. If we want to remove the duplication, we may need to rewrite it more systematically.

    @dataclass
    class SetProperty:
        """A fileset change to set the property and value for the fileset."""

        _property: str = field(metadata=config(field_name="property"))
        _value: str = field(metadata=config(field_name="value"))

        def property(self):
            """Retrieves the name of the property being set in the fileset.

            Returns:
                 The name of the property.
            """
            return self._property
noidname01 commented 3 months ago

Hi @ian910297, thanks for your detailed descriptions.

  1. About exception, actually I've created an issue recently to refactor current error handling(#4012). It seems that exception structure is related to many issues, like #3766, so maybe we can solve issue 4012 first? WDYT?

  2. I don't really get what it means, can you give more context?

  3. Yeah, I've also noticed this problem, because some of code in client-python were based on the logic in client-java, you'll see some coding style is more like Java, not Pythonic. I think we can create another subtask issue to discuss about how to rewrite this part.

ian910297 commented 3 months ago

Hi @noidname01

  1. exception I agree that we should prioritize resolving issue https://github.com/apache/gravitino/issues/4012.

  2. unit test Pylint flags the following structure as duplicate code but fixed values expected in unit test.

    AuditDTO(
            _creator="test",
            _create_time="2022-01-01T00:00:00Z",
            _last_modifier="test",
            _last_modified_time="2024-04-05T10:10:35.218Z",
        )

    I've considered two solutions:

    • (Simpler) Disable duplicated code detection for specific file / directory.
    • (More complex) Use factory / builder design pattern to create object for test. Considering the potential increase in tests, it's crucial to evaluate the scalability and reusability of the chosen approach.
  3. pythonic code I agree that we should create a separate subtask for this, similar to case 1.

In summary, I believe addressing the duplicated code linting issue is not a priority at this stage. We should focus on reviewing the errors to identify potential architectural improvements.

noidname01 commented 3 months ago

@ian910297 I think the reason for Problem 2 is that we use the same test component many times and create them in different files, maybe give them different values can solve the problem?

But the problem that every test repeats the same init logic still remains, I think we can move the same logic to parent class (ex. IntegrationTestEnv in integration tests) to prevent duplicate code, WDYT?