fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
619 stars 42 forks source link

Progress bar when read/write #261

Open matthewgson opened 3 years ago

matthewgson commented 3 years ago

Thank you for creating this awesome package, and it has been my go-to package whenever I save big files on disk. I hope to see the progressbar when I read/write big file. Is there a plan for implementing a simple progress bar option when reading /writing fst file in the future?

MarcusKlik commented 1 year ago

Hi @matthewgson, thanks for your request!

A progress bar would be very nice, but the actual call to the fstlib C++ library doesn't come back after the complete file has been read. We could create a hook to call from fstlib to update a progress bar, but that seems like overkill (and would add more dependencies to the fst package).

If you want feedback when reading very large files, you could read chunks and update a progress bar after each chunk, would that work for you?

library(dplyr)
library(fst)
library(progress)

# function to read and show progress
read_fst_progress <- function(path, columns) {

  nr_of_rows <- metadata_fst(path)$nrOfRows

  # determine chunks
  nr_of_chunks <- 100
  chunk_size <- 1 + (nr_of_rows - 1) %/% nr_of_chunks  # take partial chunks into account

  pb <- progress_bar$new(total = 100)

  lapply(1:nr_of_chunks, function(chunk) {

    pb$tick()
    Sys.sleep(0.1)  # remove this line!!!

    y <- read_fst(
      tmp_file,
      columns = columns,
      from = 1 + (chunk - 1) * chunk_size,
      to = min(chunk * chunk_size, nr_of_rows)
    )
  }) %>%
    bind_rows
}

# write sample fst file
tmp_file <- tempfile(fileext = "fst")
nr_of_rows <- 1e6
data.frame(
  X = sample(sample(1:100, nr_of_rows, replace = TRUE)),
  Y = LETTERS[sample(1:26, nr_of_rows, replace = TRUE)]
) %>%
  write_fst(tmp_file)

y <- read_fst_progress(tmp_file)

#> [===========================================================>------------------------------------------]  59%