apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Consider doing bulk processing in SV scan iterator #8634

Closed siddharthteotia closed 2 years ago

siddharthteotia commented 2 years ago
`
public int next() {
  while (_nextDocId < _numDocs) {
    int nextDocId = _nextDocId++;
    _numEntriesScanned++;
    if (_valueMatcher.doesValueMatch(nextDocId)) {
      return nextDocId;
    }
  }
  return Constants.EOF;
}
`

https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java

Bulk API can potentially be added to do this in a tight-loop (vectorized like manner) for few fixed number of records at a time as opposed to a function call to doesValueMatch per document. May be there is way to change interface such that caller of iterator can possibly directly doesValueMatch in a tight-loop

siddharthteotia commented 2 years ago

80% of the time is spent in doesValueMatch().

Screen Shot 2022-05-05 at 9 17 12 AM
siddharthteotia commented 2 years ago

@vvivekiyer is going to take a stab at this.