apache / doris-flink-connector

Flink Connector for Apache Doris
https://doris.apache.org/
Apache License 2.0
292 stars 201 forks source link

[fix][DorisBatchStreamLoad] waitAsyncLoadFinish should wait AsyncLoad finish ranthen than AsyncLoad start load #381

Closed yanghuaiGit closed 2 months ago

yanghuaiGit commented 2 months ago

…yncLoad start load

Proposed changes

Issue Number: close #xxx

Problem Summary:

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

yanghuaiGit commented 2 months ago

The waitAsyncLoadFinish method logic can insert a BatchRecordBuffer for the blocking queue flushQueue, only to indicate that the previous BatchRecordBuffer was consumed by LoadAsyncExecutor. This does not mean that the data load has finished. LoadAsyncExecutor should remove the flushQueue head data from the blocking queue after the data load has finished

yanghuaiGit commented 2 months ago

@JNSimba can you see?

JNSimba commented 2 months ago

Thank you for your contribution, but will there be any problems if we poll first from the queue?

yanghuaiGit commented 2 months ago

@JNSimba I think that waitAsyncLoadFinish means that the queue is empty and the last AsyncLoad is finished. When the queue is not empty, the poll execution means that the head is deleted from the queue, but it does not mean that the buffer load finished.

yanghuaiGit commented 2 months ago

image

JNSimba commented 2 months ago

@JNSimba I think that waitAsyncLoadFinish means that the queue is empty and the last AsyncLoad is finished. When the queue is not empty, the poll execution means that the head is deleted from the queue, but it does not mean that the buffer load finished.

Yes, you are right. But even if the buffer is polled and the load fails at the same time, the queue will be cleared and the flink task will be retried, so it does not seem to affect the correctness? And if you switch to peek, it is non-blocking when there is no data, which may consume more CPU.