Open FelixYBW opened 1 month ago
Can you print the input type and batchSize here? https://github.com/facebookincubator/velox/blob/main/velox/exec/SortBuffer.cpp#L298
batchSize: -2112458117 input type is ROW<n0_0:BIGINT,n0_1:VARCHAR,n0_2:BIGINT,n0_3:BIGINT,n0_4:VARCHAR>
Is the batchSize negative?
Yes, not sure why
vector_size_t is int32t but numInputRows, numOutputRows_ and maxOutputRows is uint32t. So when (numInputRows - numOutputRows_) is bigger than 0x7fffffff, error occurs.
const vector_size_t batchSize =
std::min<vector_size_t>(numInputRows_ - numOutputRows_, maxOutputRows);
Meanwhile we should also note that the totall row number in a partition shouldn't exceeds UINT_MAX. Not sure if any other operator has the same issue or not. We can control the batch size but can't control the partition size. So we should use uint64t data type for numInputRows and numOutputRows_. Not sure any other operators have the limitation or not.
@jinchengchenghh can you take a look and fix this?
Sure, I will help to fix it.
Backend
VL (Velox)
Bug description
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response