Currently rg with --multiline when operating on many files can be 50x slower without --mmap compared to --multiline --mmap.
More than 99% of CPU time is spent in ReadBuf::initialize_unfilled, which is called from default_read_buf, called from default_read_to_end, called from read_to_end here:
if self.config.heap_limit.is_none() {
let mut buf = self.multi_line_buffer.borrow_mut();
buf.clear();
let cap =
file.metadata().map(|m| m.len() as usize + 1).unwrap_or(0);
buf.reserve(cap);
read_from.read_to_end(&mut *buf).map_err(S::Error::error_io)?;
return Ok(());
}
If buf grows large, then the initialize_unfilled function will clear the entire capacity of the vector for every file, irrespective of the file's size, which in my case resulted in 300 GB of memory transfers for only 3 GB of data.
If DecodeReaderBytes implemented the read_to_end function, then it would be able to avoid initializing the entire buffer, only writing to the part of it that actually needs to be written.
(Alternatively, ripgrep could be changed to not call read_to_end, or to not reuse a single Vec for every file.)
Currently
rg
with--multiline
when operating on many files can be 50x slower without--mmap
compared to--multiline --mmap
.More than 99% of CPU time is spent in
ReadBuf::initialize_unfilled
, which is called fromdefault_read_buf
, called fromdefault_read_to_end
, called fromread_to_end
here:https://github.com/BurntSushi/ripgrep/blob/master/crates/searcher/src/searcher/mod.rs#L911-L919
If
buf
grows large, then theinitialize_unfilled
function will clear the entire capacity of the vector for every file, irrespective of the file's size, which in my case resulted in 300 GB of memory transfers for only 3 GB of data.If
DecodeReaderBytes
implemented theread_to_end
function, then it would be able to avoid initializing the entire buffer, only writing to the part of it that actually needs to be written.(Alternatively,
ripgrep
could be changed to not callread_to_end
, or to not reuse a singleVec
for every file.)