apache / opendal

Apache OpenDAL: access data freely.
https://opendal.apache.org
Apache License 2.0
3.27k stars 455 forks source link

`FuturesAsyncWriter` return incorrect written size #4810

Closed nooberfsh closed 3 months ago

nooberfsh commented 3 months ago
#[tokio::main]
async fn main() -> Result<()> {
    let mut w = make_writer().await?;

    let data = "hello, opendal";
    for _ in 0..1_000_000 {
        w.write_all(data.as_bytes()).await?
    }

    w.shutdown().await?;
    let wsize = w.written_size.load(std::sync::atomic::Ordering::SeqCst);
    println!("written size: {wsize}");

    Ok(())
}

async fn make_writer() -> Result<TrackWriter> {
    let mut builder = Fs::default();
    builder.root(".");
    let op: Operator = Operator::new(builder)?.finish();

    let w = op
        .writer("test.txt")
        .await
        .unwrap()
        .into_futures_async_write();
    Ok(TrackWriter::new(w))
}

// Copied from https://github.com/icelake-io/icelake/blob/main/icelake/src/io/parquet/track_writer.rs
/// `TrackWriter` is used to track the written size.
pub struct TrackWriter {
    writer: FuturesAsyncWriter,
    written_size: Arc<AtomicU64>,
}

impl tokio::io::AsyncWrite for TrackWriter { ... }

Full code can be found in this repo

The above code write a simple string literal "hello, opendal" to local file test.txt 1_000_000 times using FuturesAsyncWriter. The expected written_size should be 14000000, but the above code print 14000326 on my machine which does not match the file size.

opendal version: 0.47.1 rustc version: rustc 1.79.0 (129f3b996 2024-06-10)

Xuanwo commented 3 months ago

Hi, please double check the code here:

https://github.com/nooberfsh/opendal_incorrect_write_szie/blob/6f1780a3ca192c0f49f34254245f43ca54634390/src/main.rs#L68-L71

There is no guarantee that write will always write the entire buffer. Therefore, we should calculate the returned n instead.

nooberfsh commented 3 months ago

That piece of code is from icelake, It was originally use the returned n, this commit changed it to buf.len(). I didn't aware it was changed, sorry for the bother. I will open an issue in icelake.