Implement backpressure - Githubissues

FlipB commented 11 months ago

Is your feature request related to a problem? Please describe. Minus becomes unresponsive when attempting to page big files (gigabytes).

Describe the solution you'd like It seems like the pager's buffer is unbounded and keeps growing to attempt to fit the entire input. Instead the pager ought to limit the buffer to eg. 1000 lines, and block write_str when full. Scrolling the output consumes the buffer, allowing new lines to be written by write_str.

Additional context It spends virtually all time in PagerState::append_str, pegging the CPU at 100%.

AMythicDev commented 11 months ago

Thanks for filing the issue. Now talking about it

Scrolling the output consumes the buffer, allowing new lines to be written by write_str.

I don't think this could be done. If we "consume" the buffer as the user scrolls down the page it would be impossible to show the same data again if the user decides to go back up.

I knew this problem would surely arise someday and I have yet to figure out a the correct solution to this. Here are some of those which I had thought:-

Applications should buffer the data themselves and also override the default key bindings the so that whenever the user presses the down arrow or page down or G or any key that requires more data to be fetched.
Get #95 working. Although this too will require applications to buffer the data themselves but it does not require overriding every key binding that will require more data to be fetched.
Format PagerState::rows lines of data initially and then use another thread to format rest of the data so that the UI always remains responsive.

I also believe that the solution isn't just one of them but instead a mixture of bits and pieces from all of these ideas.

Anyways If you could share me more on how exactly you are using minus that would be quite helpful.

If you want to contribute you are more than welcome to open a PR. Comment down below any suggestions if you have.

EDIT: Give numbers to each point

FlipB commented 11 months ago

I don't think this could be done. If we "consume" the buffer as the user scrolls down the page it would be impossible to show the same data again if the user decides to go back up.

You are correct that you cannot scroll back up to the start if we limit the buffer of the pager. However if the input file is big enough, that's the only reasonable thing we can do, no? I've looked into how less works and it has an option to limit the buffer to 64KiB (so limited scrollback possibilities) but by default it also uses unbounded buffers when reading from a pipe (see -B in less man page).

I tried less (via pager crate) with the same file that minus had problems with and it worked fine - it ended up using 1.8GiB of memory. So in this case, maybe the problem wasn't the memory usage but rather performance related.

However I still think this feature is a good idea for my use case. I'm parsing log files and want to provide a pager for the output (similar to how journalctl works), these files can surely get bigger and I'm not too worried about having to provide a complete scrollback for insanely large files (user can just manually restart in that scenario).

Format PagerState::rows lines of data initially and then use another thread to format rest of the data so that the UI always remains responsive.

As I mentioned above, less worked fine even though it allocated memory for the entire input, so the problem in minus could very well primarily have been unresponsive UI.

Applications should buffer the data themselves and also override the default key bindings the so that whenever the user presses the down arrow or page down or G or any key that requires more data to be fetched.

Get https://github.com/arijit79/minus/issues/95 working. Although this too will require applications to buffer the data themselves but it does not require overriding every key binding that will require more data to be fetched.

I think this could work, but minus has a rather simple interface today and these API additions would have to be designed carefully to not be too difficult to use.

AMythicDev commented 11 months ago

I spent the last night giving thought to each of these ideas that's why I am replying quite late today.

I know the 64KB approach of less but the problem with implementing it is that many applications depending on minus right now might be benefiting from the current unbounded buffer. For example many applications use a pager to show real-time data on the screen such as htop and it might happen that the application pushes a lot of data to the pager at once (more than 64KB) then minus would break the data at the 64KB and it might cause a broken or non updated display at all. When applications use less for showing real-time data they pipe the data but minus doesn't care about data source as it is the job of applications depending on it. Also introducing this may cause broken displays on certain applications. I know we could add a function to control the buffer size but that will still require applications to do changes to their codebases which which in turn requires a major release.

(3) is another solutions for applications but I fear that it might require more knowledge of the data. I want minus to know about the data only quantitatively.

(1) can make the application code quite repetitive like you have to call whatever your fetch function is in each of key/mouse binding that move the screen down.

I think the best option would be to implement (2) and have a hook like EOF. Applications first send a certain amount of data to the pager and attach their data fetching function to this hook. Whenever this hook is activated due to a user event like going downward, the data fetching function is called and more data is taken.

TornaxO7 commented 10 months ago

I've maybe got an idea for this:

What is if minus internally uses a temporary file if the buffer of minus reaches a specifique limit. Then minus will only read some blocks of the temporary file and not the whole temporary file.

AMythicDev commented 10 months ago

Could you describe a bit more? What I am currently understanding from your explanation is that we take the data from the application and write it into a temporary file and also keep some part of the data in memory. As the user goes down and we need more data, we will read more data from the temp file and save it into the memory buffer. If this is exactly what you were saying then I don't think this is a good idea since we have to take all the data from the application and write into the temp file which would already take a large space on memory to pass from the application side to minus's side. I also don't want to use stuff like memory maps if you're talking about that because anonymous memory maps basically don't exist in Windows.

AMythicDev commented 8 months ago

I pushed some commits, specifically c0475fd and 4a015a8 which greatly improve text append throughout. I will get some concrete benchmarks posted tomorrow.

EDIT:- Here are benchmarks after the changes:- test state::bench::bench_append_str ... bench: 2,778,838,569 ns/iter (+/- 329,823,431)

test result: ok. 0 passed; 0 failed; 63 ignored; 1 measured; 0 filtered out; finished in 842.37s

I wasn't able to get benchmarks before these changes as the benchmark process just didn't finish even in 4hrs.

It still takes about 2.7 secs so there's should be a lot improve upon.

I have created #127 to track performance improvements all over the project.

AMythicDev commented 8 months ago

@FlipB can you do this test again against the latest main and post the results?

TornaxO7 commented 7 months ago

@arijit79 Sorry for the late reply, I've forgot to respond on this.

Could you describe a bit more? What I am currently understanding from your explanation is that we take the data from the application and write it into a temporary file and also keep some part of the data in memory. As the user goes down and we need more data, we will read more data from the temp file and save it into the memory buffer. If this is exactly what you were saying then I don't think this is a good idea since we have to take all the data from the application and write into the temp file which would already take a large space on memory to pass from the application side to minus's side. I also don't want to use stuff like memory maps if you're talking about that because memory maps don't exist in Windows.

I think it's clearer if we use an image here:

My idea is the following: We are splitting up the whole document into smaller chunks. Some chunks are either loaded in the memory, for quick access while the rest resides in file buffers (which is either in memory or in the disc). The idea here is that we just need to store, for example, 3 chunks ahead and behind the user's chunk (where the user is currently reading). If the user enters the next chunk, while reading, we can discard the topmost memory chunk, since it's unlikely that the user will get fast to the topmost chunk again and we're loading the next chunk instead into the memory.

AMythicDev commented 7 months ago

@arijit79 Sorry for the late reply, I've forgot to respond on this.

Its alright.

@TornaxO7 I really appreciate your efforts. I have created a branch called faster-append with some benchmarking code which create a ~1GB buffer in minus. This will be our starting ground for further development.

AMythicDev / minus

Implement backpressure #106