Although minus is quite performant in its routine stuff like efficiently drawing the terminal, handling terminal events etc however one of the areas where it isn't optimized much is with processing the input data.
Currently we store two copies of the entire data, one is the original data without any formatting which is used to recreate the formatted data. The other one is of course the formatted data that is suitable for the terminal and includes stuff like line numbering and search highlights. another aspect related to this is that minus currently employs an eager model to format data which means minus will immediately try to format all the data that it received.
Although this works, there are several drawbacks to this approach:-
Currently minus cannot page over data that exceed the size of the available RAM in the machine.
It takes a really long time to format large data which causes the main thread to stay paused during that interval.
Proposed Solutions
Memory maps
Here are two of the most plausible solutions that I have found:
One of the proposed solutions suggested by @TornaxO7 who gave an amazing explanation in his comment was to store the data in a memory-mapped file and read through from it.
One of the main criticism against this idea that I had is that it relied on memory-maps which although work fine in all UNIX-style systems but kinda sucks on Windows. Specifically Windows doesn't have anonymous memory-maps.
DataSource
So another solution that I want to propose here which is highly inspired by the previous solution is the DataSource trait. The trait will allow applications to hook up their data into minus without them loading the data for minus. This means a simple file pager can register the file into minus without loading the the entire file buffer into memory. Similarly a network based application can hook up the socket to minus without reading the entire socket. Now as the user scrolls through the page display, minus will automatically load the data from the source into its buffer.
The reads_forward() and reads_backward will ask the source the read a couple of lines in forward/backward direction. respectively The s in their names suggest that they will be called when lines needs to be read sequentially which means relative to the upper_mark.
The readr_lines will ask the source to fetch the line at index index and also some lines that are adjacent to index in both direction. In the returning tuple the first element is the actual line at index index, the 2nd element are a couple of lines after index and the third element are couple of lines before index.
The exact number of lines to read is still undecided. When we finalize on this the Vecs will by replaced by fixed sized arrays.
[1]: This is not final and it may have changes down the road.
The Problem
Although minus is quite performant in its routine stuff like efficiently drawing the terminal, handling terminal events etc however one of the areas where it isn't optimized much is with processing the input data.
Currently we store two copies of the entire data, one is the original data without any formatting which is used to recreate the formatted data. The other one is of course the formatted data that is suitable for the terminal and includes stuff like line numbering and search highlights. another aspect related to this is that minus currently employs an eager model to format data which means minus will immediately try to format all the data that it received.
Although this works, there are several drawbacks to this approach:-
Proposed Solutions
Memory maps
Here are two of the most plausible solutions that I have found:
One of the proposed solutions suggested by @TornaxO7 who gave an amazing explanation in his comment was to store the data in a memory-mapped file and read through from it.
One of the main criticism against this idea that I had is that it relied on memory-maps which although work fine in all UNIX-style systems but kinda sucks on Windows. Specifically Windows doesn't have anonymous memory-maps.
DataSource
So another solution that I want to propose here which is highly inspired by the previous solution is the
DataSource
trait. The trait will allow applications to hook up their data into minus without them loading the data for minus. This means a simple file pager can register the file into minus without loading the the entire file buffer into memory. Similarly a network based application can hook up the socket to minus without reading the entire socket. Now as the user scrolls through the page display, minus will automatically load the data from the source into its buffer.It will have the following signature^[*1]^
The
reads_forward()
andreads_backward
will ask the source the read a couple of lines in forward/backward direction. respectively Thes
in their names suggest that they will be called when lines needs to be read sequentially which means relative to theupper_mark
.The
readr_lines
will ask the source to fetch the line at indexindex
and also some lines that are adjacent toindex
in both direction. In the returning tuple the first element is the actual line at indexindex
, the 2nd element are a couple of lines afterindex
and the third element are couple of lines beforeindex
.The exact number of lines to read is still undecided. When we finalize on this the
Vec
s will by replaced by fixed sized arrays.[1]: This is not final and it may have changes down the road.
Related to: #106 cc: @TornaxO7, @FlipB