Closed SilverMira closed 5 months ago
Hi @SilverMira,
Thanks for the detailed example and benchmark, this a known parameter issue that it is known to cause slowdowns on slower machines/high occupancy ones. The mentioned pattern that you exposed above was implemented in order to continuously buffer the process output, since users could request less characters than the process has produced and if the process stops, the user would want to access the remaining output, regardless if the process has stopped running, thus the need to cache the process output.
Right now the reading process is I/O blocking, however, since a cache is required, then you'd need to continuously read from the process output, store into the buffered channel, which is then read by the cache thread and relayed to the end read
call, caching the extra characters in the process if the call requested less symbols than those available. The main issue here is the continuous part, which implies an infinite loop.
Stopping the cache could be improved by distinguishing the read from the stop message requests. However, the main issue lies in the main reading process, which can get I/O blocked by the process output, and locking on the stop message leads to hunger issues, since the continuous read would not take place whilst the process is waiting for the stop signal.
Your proposed solution consists on spawning a channel per call, which then is relayed to the process as a single point of communication, this makes sense for stateless client-server applications, since in a concurrent scenario, each request would get its own communication channel. This does not hold in this case, due to the PTY not being an stateless "object".
I think this issue is more related to the architecture of the system than to the implementation of it. On https://github.com/andfoy/winpty-rs/issues/72 another user discussed about the possibility of using the WaitForSingleObject
to accurately wait for a process to reach EOF or if it has been terminated, but, checking for it, implies a timeout or locking on that check, thus again causing the hunger issue.
The main problem here is related to the caching thread and the number of characters that an user could request. The problem under the current implementation is that we cannot guarantee performance over reading consistency of the output, since consistency implies a continuous read, and performance would imply reading the characters of the process, with the possibility of losing output if the process finishes before all characters are read.
An option would be to use Arc
or an atomic boolean. I need to come up with an example, however, it can be an option, since the read operation would return EOF, the thread would check back the status of the atomic variable and then stop
Something alongside the lines of https://users.rust-lang.org/t/using-arc-to-terminate-a-thread/81533/2
cause slowdowns on slower machines/high occupancy ones
There is latency regardless of machine performance as long read() is being called frequently, https://github.com/andfoy/winpty-rs/blob/dbbefcd68fbcbdd559ccad652f57462bf95076b4/src/pty/base.rs#L381-L382
Since nothing is sent through cache_alive_rx when PTY is still in scope, each iteration of the while loop has a inherent 100ms wait time before the next iteration. If the user has long enough of a delay before a .read() call, there won't be any observable latency since the cache thread would already be waiting for a read request at L382. But once the user has called the read(), we would be waiting at L381 for 100ms again before any next read request can be served.
I understand the need of buffering output from PTY, but I don't think it really relates to this issue. In fact, I'm pretty sure just changing L381 to use try_recv() has no downsides and will resolve this subsequent read() latency. Though in my testing, the first read() call still has a 100ms latency.
The Drop logic already sends message to both cache_req and cache_alive channels, so even if we don't have any wait time on cache_alive at L381 and is idling for a read request, the thread will still exit as expected when PTY is dropped.
Similarly, the reader_thread also doesn't seem like it needs the recv_timeout and can also use try_recv. https://github.com/andfoy/winpty-rs/blob/dbbefcd68fbcbdd559ccad652f57462bf95076b4/src/pty/base.rs#L362-L371
Since it looks like reader_thread is calling read() blockingly, if there is no output from PTY, then it will block on the read() line waiting for more bytes and avoid pinning any CPU. Once it has finished a read, it will immediately start the next read attempt without waiting for 100ms on L362
@SilverMira, thanks for the helpful discussion, your analysis is correct. Since you have already experimented with try_recv
already, would you prefer to open a PR with the changes?
@SilverMira, sorry for the long delay, there's now a new release available with your changes!
Encountered this issue while trying to use the crate for an interactive shell scenario, it appears that each .read() call takes at the minimal 100ms
Below is a minimal reproduce, even reading 1 character seems to have this bottleneck.
It appears to be coming from .recv_timeout found in https://github.com/andfoy/winpty-rs/blob/dbbefcd68fbcbdd559ccad652f57462bf95076b4/src/pty/base.rs#L362
https://github.com/andfoy/winpty-rs/blob/dbbefcd68fbcbdd559ccad652f57462bf95076b4/src/pty/base.rs#L381
https://github.com/andfoy/winpty-rs/blob/dbbefcd68fbcbdd559ccad652f57462bf95076b4/src/pty/base.rs#L402
Correct me if I'm wrong, this seems to be some sort of clean up signal that is used when PTY is dropped, however this implementation is suboptimal as per the latency observed above. I tested using .try_recv and the observed latency immediately went away, however .try_recv in a tight loop would mean CPU is pinned always.
With my limited time looking at the code, it looks like some sort of worker thread pattern is being implemented for reading the PTY. Do you think something like this will work better? It does utilize the oneshot crate for single use channels to pass results back to the main thread