apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.93k stars 979 forks source link

DRILL-8480: Cleanup before finished. 0 out of 1 streams have finished #2897

Closed rymarm closed 4 months ago

rymarm commented 6 months ago

DRILL-8480: Make Nested Loop Join operator properly process empty batches and batches with new schema

Description

Nested Loop Join operator (NestedLoopJoinBatch, NestedLoopJoin) unproperly handles batch iteration outcome OK with 0 records. Drill design of the processing of batches involves 5 states:

Solution

Make the Nested Loop Join operator properly handle OK and OK_NEW_SCHEMA outcomes with 0 records and keep processing until NONE and NOT_YET outcomes are received.

Make the Nested Loop Join operator keep processing even if OK_NEW_SCHEMA outcome is received, but the schema wasn’t changed(yes, I know it sounds wild, but it’s possible and it is expected behavior).

Documentation

-

Testing

Manual testing with a file from the Jira ticket DRILL-8480

cgivre commented 5 months ago

@rymarm. What is the status of this PR? Is it ready for merging?

rymarm commented 5 months ago

@cgivre Actually, there is one more thing, that I would fix in the scope of this PR: https://github.com/apache/drill/blob/a726a4544dfbf1427f41fb916d3d976bd511189b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java#L396-L399

These few lines seem to be kludge which may not work in some very rare cases, but I forgot what issue occurs if remove it. I would like to take a look at this, but I can do this in a separate PR because the current issue is completely fixed with the current changes.