Open nozjkoitop opened 3 months ago
Thanks a lot for the contribution 🎉 Curious if the change in the PR solves the bug. Also, is the result correct (i.e. there are 500 rows ingested after the fix)? I think that the patch still wouldn't be sufficient because the output partitions are 2444, and the input to the limit stage must only be a single partition. Can you please add a test case as well for the same?
Funny enough, there's also https://github.com/apache/druid/pull/16643 which solves a very closely related bug, but not quite the same. I am curious if that inadvertently resolves this one as well.
Thanks a lot for the contribution 🎉 Curious if the change in the PR solves the bug. Also, is the result correct (i.e. there are 500 rows ingested after the fix)? I think that the patch still wouldn't be sufficient because the output partitions are 2444, and the input to the limit stage must only be a single partition. Can you please add a test case as well for the same?
Funny enough, there's also #16643 which solves a very closely related bug, but not quite the same. I am curious if that inadvertently resolves this one as well.
It's fixes a bug, although I'm not sure why input to the limit stage should be a single partition if it merges it anyway, thanks for pointing out another PR I'm not sure how it'll gonna work, need to test that, as we'll be writing results to a multiple partitions but at the end of the day everything will be written to one file in the durable storage.
multiple partitions but at the end of the day all will be written to one file in the durable storage
That shouldn't happen - output of a query like the following in durable storage should produce multiple partitions (which it isn't doing so at the moment)
SELECT * FROM foo LIMIT 1000000
That shouldn't happen - output of a query like the following in durable storage should produce multiple partitions (which it isn't doing so at the moment)
SELECT * FROM foo LIMIT 1000000
I was thinking about that, OffsetLimitFrameProcessorFactory should be updated in this case, now the output channel is hardcoded
@nozjkoitop There was a more comprehensive change for similar issues with all the queries containing a LIMIT clause. It was merged recently https://github.com/apache/druid/pull/16643 and it should potentially fix the problem that you are seeing. Can you please verify if that patch fixes the issue?
Can you please verify if that patch fixes the issue?
Hi @LakshSingla, thanks for the suggestion, it actually does fix the problem, although WDYT about the scan stage which always results to 1 partition if it hasLimitOrOffset? It could have pretty big output partition
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
Resolved a runtime exception triggered by executing limited and ordered scan queries via the MSQ engine.
The main problem was that the system expected multiple partitions to be made, but in reality, only one partition was created by the OffsetLimitFrameProcessorFactory. During handling process caused by ShuffleSpec == null, expected partitions were counted based on workerInputs, which were created based on slicer from the stageTracker of previous stage.
When using MixShuffleSpec at the limit stage, it results in creating just one partition for the stage's outcomes, which seems to be the behavior we expect.
This PR has: