Closed Jackie-Jiang closed 2 years ago
After some time thinking about this test and understanding how it works, I think the test was designed with this scenario in mind:
sequenceDiagram
Test->>+Controller: update table config (0 */20 * ? * * *)
Controller->>+Helix: update table config (0 */20 * ? * * *)
Helix->>Helix: Change ideal state to (0 */20 * ? * * *)
Helix->>-Controller: Ok
Controller->>-Test: Ok
Test->>+Controller: get job info
Controller->>+Helix: get job info
Helix->>-Controller: updated job info (0 */20 * ? * * *)
Controller->>-Test: updated job info (0 */20 * ? * * *)
But I think Helix does not guaranteed that the sequence and sometimes, due to the lack of resources in GHA, we may find this scenario:
sequenceDiagram
Test->>+Controller: update table config (0 */20 * ? * * *)
Controller->>+Helix: update table config (0 */20 * ? * * *)
Helix->>Controller: Ok
Controller->>-Test: Ok
Test->>+Controller: get job info
Controller->>+Helix: get job info
Helix->>-Controller: updated job info (0 */10 * ? * * *)
Helix->>-Helix: Change ideal state (0 */20 * ? * * *)
Controller->>-Test: updated job info (0 */10 * ? * * *)
If that is the case, the palliative solution is to retry the validation with some timeout.
Failures:
Example run: https://github.com/apache/pinot/runs/6597477355?check_suite_focus=true