Closed cyuyang closed 4 years ago
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.
Affected Version
0.16.0
Description
One of our kinesis indexing tasks halted until the task exited after the preset duration. It caused one of the shards lagging behind. Interestingly, the task reported SUCCESS status after exiting. After digging into the log and the kinesis indexer code, we suspect that the KinesisRecordSupplier doesn't handle some transient exceptions gracefully.
Related logs:
Related code:
On transient AmazonServiceException (503 as shown in the log), the thread executing the polling will be killed and it will not be rescheduled on the ExecutorService. No new record will be put on the BlockingQueue. The failure is not reported anywhere and didn't cause the kinesis index task to fail.