Open rymarm opened 2 months ago
@rymarm, thanks for finding this issue. I believe your fix is correct. The prepareWrite()
call exists to fill any remaining bits with 0s (i.e. "fill empties."). The number to be filled is, indeed, valueCount - 1
. So, I think this is a good fix.
I would feel more confident if we had a unit test. That this bug exists suggests that, despite the rather excessive number of tests I wrote way back when, I somehow omitted this case. Perhaps a reason is that the bit vector is special: it uses code different than the other vectors, and so needs its own tests. Tests for the fixed-width vectors are in TestFixedWidthWriter
and TestFillEmpties
. Perhaps you can whip up a TestBitVector
that does a few tests, including for the special case you uncovered. In particular, it would be good to make sure that the following cases work:
setValueCount()
with a number that is a multiple of 8, a multiple of 8 plus one, and a multiple of 8 minus one. (three cases)This gives us 9 cases, some that will force a resize, some that should just fit in the initial allocation.
Then, read the bits out of the vector, verifying that the "empty" bits are the defaultValue
(the "default default" is 0).
Why write tests? It is less embarrassing for a test to point out a problem than to find it in a production system. I apologize that I somehow missed doing the tests in the first place.
The second concern is whether this same error was made for other writers. The answer seems to be "no": the other writers use a different structure for setValueCount()
. As the comment in BitColumnWriter
says, it is the only vector that defers to the mechanism in the vector's own mutator. The result is that the bit vector code differs from the other vectors which handle the entire write process themselves.
@paul-rogers thank you so much for the review! I found and fixed one mistake which the failed tests highlighted already and will write additional tests for the cases you described.
I agree that test-driven development is much safer and better than a simple fix without test coverage of the affected place. It is just hard to me to write tests for Drill, they looks complicated and not isolated to me. But I'll write them.
@rymarm, thanks for looking at the tests. While writing tests for Drill can be hard, we developed a bunch of tools for particular cases. In the case of the vector writers, those tests I referenced above should show you what to do.
What I did, back when I wrote this stuff, is to run the tests in my IDE. I've not had much luck in recent years running the whole unit test suite locally. But, I can still generally run individual unit tests. A handy way to proceed is to first create a new file that has the test frameworks included. Then, write one simple test that just writes to the vector and reads back the data. At that point, it should be clear how to create other cases.
It would be great to temporarily revert your change and create a case that fails, then restore the change and ensure things work.
Please let me know if creating the tests takes more than a small amount of time. I can perhaps help out by creating a basic bit vector use case (if I can remember how all that code works!)
Just an FYI, but the Github actions are not working well at the moment. The two that use Hadoop 2 seem to have some connectivity issue with Hive. @jnturton is working on it.
DRILL-8511: Overflow appeared when the batch reached rows limit
Description
The size-aware scan framework fails to end the batch.
Framework tries to reallocate the vector on batch end, due to a hidden, minor bug in
BitColumnWriter
- which in general is not notable, but in a specific case, when the initial vector allocation size limit is exceeded and a reader reaches the batch row size limit.BitColumnWriter
uses instead of a write index a value count and this causes unexpected vector reallocation (look at the changes).Documentation
No changes required.
Testing
Manual tests