apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.73k stars 4.58k forks source link

[Fix-16382] Fix the bug of async master task casthread pool invocations ramp-up #16461

Closed Dyqer closed 1 month ago

Dyqer commented 1 month ago

Purpose of the pull request

fix https://github.com/apache/dolphinscheduler/issues/16460

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

ruanwenjun commented 1 month ago

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

Dyqer commented 1 month ago

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

but the timeout observer is in master standby list, it may have 1~10 minutes later than real timeout, the functions will be invoked in a dead loop during those 10 minutes.

Dyqer commented 1 month ago

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

@ruanwenjun Here are two critical issues about AsyncMasterTask:

  1. if the task is set to just warn and not fail on timeout, it will immediately enter an infinite loop after reaching the timeout period.
  2. there's a mistake in the timeout unit, should be minute not second, it exacerbates the problem.

I think the best solution is to ignore the timeout logic here and only focus on the execution interval. The timeout event should still be handled by the stateEventhandler.

ruanwenjun commented 1 month ago

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

@ruanwenjun Here are two critical issues about AsyncMasterTask:

  1. if the task is set to just warn and not fail on timeout, it will immediately enter an infinite loop after reaching the timeout period.
  2. there's a mistake in the timeout unit, should be minute not second, it exacerbates the problem.

I think the best solution is to ignore the timeout logic here and only focus on the execution interval. The timeout event should still be handled by the stateEventhandler.

Yes, there exist a bug as you mention, and you are right, the task executor don't need to care about the timeout, all timeout event should be controlled by master event engine.

Dyqer commented 1 month ago

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

SbloodyS commented 1 month ago

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

CI failed. You should run mvn spotless:apply to format code.

Dyqer commented 1 month ago

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

CI failed. You should run mvn spotless:apply to format code.

Oh, Thank you!, please help to review again @ruanwenjun @SbloodyS

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud