Open gggrace14 opened 1 year ago
@mbasmanova @spershin @xiaoxmeng
CC: @oerling
@gggrace14 Ge, another solution is to switch to normalizedKey or hash-based aggregation if we hit the memory limit using array-based aggregation and number of unique keys is relatively small.
CC: @oerling
@gggrace14 Ge, another solution is to switch to normalizedKey or hash-based aggregation if we hit the memory limit using array-based aggregation and number of unique keys is relatively small.
Masha, @oerling is looking at that now as a longer term solution.
The same issue applies to partial group-by over a single integer key when there are small number of distinct key values within a large range. For example, 2 distinct keys: 460'000 and -460'000. CC: @oerling
Partial aggregation using a single group-by key may flush after adding just 2 distinct keys. Here is an example:
Notice that there are only 2 distinct values of the group-by key: "web" and "app", but partial aggregation still flushes (note flushTimes and flushRowCount above).
Group-by keys which are short strings, i.e. strings of size <= 7 bytes, are mapped to 64-bit integers using the following logic from VectorHasher.h
This maps "web" to 23,225,719 and "app" to 24,146,017. After converting string keys to numbers, VectorHasher reports that the keys are in range [23,225,719, 24,146,017]. The size of that range is 920,298. HashTable::decideHashMode then chooses kArray as hash mode. It extends the range by adding 50% padding on both ends, which increases the range size to 1,840,596. This is still within limits for kArray as it allows up to 2M entries.
In array mode, we allocate an array of char pointers of the range size. In this case, that array uses 9 1,840,596 bytes ~= 16MB of memory.
In addition, we use approx_percentile aggregation which allocates ~ 0.5MB for 2 accumulators. (CC: @Yuhta)
As a result, we hit memory limit for partial aggregation (16MB) and trigger flushing.
--- Additional notes
Query config kMaxPartialAggregationMemory is the memory threshold to check for partial aggregation flush. Its current default values is 16MB. https://github.com/facebookincubator/velox/blob/94484dff07180ecef5a651e3130f259af7ecefca/velox/exec/HashAggregation.cpp#L190-L193
The functionality of adaptively increase kMaxPartialAggregationMemory until it hits kExtendedMaxPartialAggregationMemory is also in place to avoid future flush. Currently kExtendedMaxPartialAggregationMemory is also set to 16MB so that this functionality is disabled. https://github.com/facebookincubator/velox/blob/94484dff07180ecef5a651e3130f259af7ecefca/velox/exec/HashAggregation.cpp#L245-L251
The threshold for the kArray hash mode of HashTable, kArrayHashMaxSize, is 2M entries or 16MB. https://github.com/facebookincubator/velox/blob/a50188efa8154aa7e948a94a2db9b260585775d0/velox/exec/HashTable.h#L69-L70 So the HashTable used by partial aggregation could be ~16MB large in kArray mode and trigger flushes repeatedly. This could even happen to a partial aggregation we've seen with a short string key of only 2 values.
Possible solutions