Closed rymarm closed 6 months ago
@rymarm. What is the status of this PR? Is it ready for merging?
@cgivre Actually, there is one more thing, that I would fix in the scope of this PR: https://github.com/apache/drill/blob/a726a4544dfbf1427f41fb916d3d976bd511189b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java#L396-L399
These few lines seem to be kludge which may not work in some very rare cases, but I forgot what issue occurs if remove it. I would like to take a look at this, but I can do this in a separate PR because the current issue is completely fixed with the current changes.
DRILL-8480: Make Nested Loop Join operator properly process empty batches and batches with new schema
Description
Nested Loop Join operator (
NestedLoopJoinBatch
,NestedLoopJoin
) unproperly handles batch iteration outcomeOK
with 0 records. Drill design of the processing of batches involves 5 states:NONE
(batch can have only 0 records)OK
(batch can have 0+ records)OK_NEW_SCHEMA
(batch can have 0+ records)NOT_YET
(undefined)EMIT
(batch can have 0+ records) The Nested Loop Join operator in some circumstances could receiveOK
outcome with 0 records, and instead of requesting the next batch, the operator stops data processing and returnsNONE
outcome to upstream batches(operators) without freeing resources of underlying batches.Solution
Make the Nested Loop Join operator properly handle
OK
andOK_NEW_SCHEMA
outcomes with 0 records and keep processing untilNONE
andNOT_YET
outcomes are received.Make the Nested Loop Join operator keep processing even if
OK_NEW_SCHEMA
outcome is received, but the schema wasn’t changed(yes, I know it sounds wild, but it’s possible and it is expected behavior).Documentation
-
Testing
Manual testing with a file from the Jira ticket DRILL-8480