console-rs / indicatif

A command line progress reporting library for Rust
MIT License
4.23k stars 240 forks source link

ProgressBarIter and finding stream length via seeking breaks rate estimation #480

Closed rlee287 closed 1 year ago

rlee287 commented 1 year ago

MCVE:

use std::io::{Read, Seek, SeekFrom};
use std::io::Cursor;

use std::thread::sleep;
use std::time::Duration;

use indicatif::{ProgressBar, ProgressStyle};

pub fn seek_len(seekable: &mut dyn Seek) -> u64 {
    let old_pos = seekable.stream_position().unwrap();
    let len = seekable.seek(SeekFrom::End(0)).unwrap();
    if old_pos != len {
        seekable.seek(SeekFrom::Start(old_pos)).unwrap();
    }
    // return
    len
}

fn main() {
    let mut dummy_buf: [u8; 4] = [0x00; 4];
    let reader = Cursor::new([0xff; 4*100]);

    let pb_style = ProgressStyle::default_bar()
            .template("{wide_bar} | {pos}/{len} | {per_sec}")
            .unwrap();

    let pb = ProgressBar::new(100*4);
    pb.set_style(pb_style);
    let mut pb_reader = pb.wrap_read(reader);
    seek_len(&mut pb_reader);
    for _i in 0..100 {
        pb_reader.read(&mut dummy_buf).unwrap();
        sleep(Duration::from_millis(40));
    }
    pb.finish();
}

Expected behavior: rate hovers slightly below 100/s

Actual behavior: rate is stuck at an absurdly high constant while progress bar counts upwards and drops to slightly below 100/s when the progress bar completes

This behavior could occur when a ProgressBarIter is passed to a generic function taking a Read + Seek object.

This is a regression from v0.16.

djc commented 1 year ago

I think this is a duplicate of the discussion in #394. It would be great if someone can investigate this in more detail and propose a fix (potentially to revert some Estimator changes to the 0.16 state).

rlee287 commented 1 year ago

After some investigation, I've managed to track down the root cause of my issue. While #394 is a discussion of computing ETA, I do not believe it is closely related as that issue does not discuss problems in rate estimation that arise with seeking.

I've linked to the relevant lines of code in the Estimator struct below for reference:

https://github.com/console-rs/indicatif/blob/791068cfb9d4666efcc786c1ef9724a3f12da56c/src/state.rs#L385-L405

A typical approach of finding the length of a Seek object is through a function like the one below (copied from my MCVE):

pub fn seek_len(seekable: &mut dyn Seek) -> u64 {
    let old_pos = seekable.stream_position().unwrap();
    let len = seekable.seek(SeekFrom::End(0)).unwrap();
    if old_pos != len {
        seekable.seek(SeekFrom::Start(old_pos)).unwrap();
    }
    // return
    len
}

This involves 2-3 seek operations:

  1. A seek to find the current position (which may not be necessary depending on how stream_position() is actually implemented)
  2. A seek to the end to find the length
  3. A seek back to the initial position to reset the seek position

Unfortunately this seek pattern has unfortunate interactions with the rate estimator:

I can think of two possible solutions:

  1. Do not record a step in the Estimator if the value received is equal to the length of the underlying object. This does not resolve a similar bug that would arise if users did a massive seek forwards and then back that did not line up with the end of the underlying object. This is also not possible for iterators that do not have a predetermined length. Having different behavior depending on whether the underlying object is Seek or not might also require some form of specialization.
  2. Reset the Estimator whenever a backwards seek/step is detected.

I am leaning towards the second solution, but I would like to hear other people's opinions before submitting a PR for this.

djc commented 1 year ago

Yeah, (2) definitely seems cleaner than (1). Of course it's still kind of a crappy heuristic, but I guess it's unlikely to do much harm.