DecodeReaderBytes should implement read_to_end

Currently rg with --multiline when operating on many files can be 50x slower without --mmap compared to --multiline --mmap.

More than 99% of CPU time is spent in ReadBuf::initialize_unfilled, which is called from default_read_buf, called from default_read_to_end, called from read_to_end here:

        if self.config.heap_limit.is_none() {
            let mut buf = self.multi_line_buffer.borrow_mut();
            buf.clear();
            let cap =
                file.metadata().map(|m| m.len() as usize + 1).unwrap_or(0);
            buf.reserve(cap);
            read_from.read_to_end(&mut *buf).map_err(S::Error::error_io)?;
            return Ok(());
        }

https://github.com/BurntSushi/ripgrep/blob/master/crates/searcher/src/searcher/mod.rs#L911-L919

If buf grows large, then the initialize_unfilled function will clear the entire capacity of the vector for every file, irrespective of the file's size, which in my case resulted in 300 GB of memory transfers for only 3 GB of data.

If DecodeReaderBytes implemented the read_to_end function, then it would be able to avoid initializing the entire buffer, only writing to the part of it that actually needs to be written.

(Alternatively, ripgrep could be changed to not call read_to_end, or to not reuse a single Vec for every file.)

BurntSushi / encoding_rs_io

DecodeReaderBytes should implement read_to_end #16