ambiorix-web / ambiorix

🖥️ Web framework for R
http://ambiorix.dev
GNU General Public License v3.0
211 stars 9 forks source link

`parse_multipart()` is very slow for file uploads #65

Closed kennedymwavu closed 6 months ago

kennedymwavu commented 6 months ago

I have 2 csv files:

flights.csv can be reproduced by running this:

data.table::fwrite(x = nycflights13::flights, file = "flights.csv")

When I upload sample.csv, parsing is almost instantaneous. But when I upload flights.csv parsing of form data takes almost a minute!

#' Handle POST on '/upload'
#'
#' @param req Request object.
#' @param res Response object.
#' @export
file_upload <- \(req, res) {
  total_time <- system.time({
    body <- parse_multipart(req)
  })

  print(total_time)

  html <- tags$p(
    class = "text-success",
    "File uploaded successfully!"
  )
  res$send(html)
}

image

Could I be missing something here? I'd be happy to provide more details.

kennedymwavu commented 6 months ago

I've also tried using Rook::Multipart$parse(req) and it's also that slow.

#' Handle POST on '/upload'
#'
#' @param req Request object.
#' @param res Response object.
#' @export
file_upload <- \(req, res) {
  total_time <- system.time({
    body <- Rook::Multipart$parse(req)
  })

  print(total_time)

  html <- tags$p(
    class = "text-success",
    "File uploaded successfully!"
  )
  res$send(html)
}

image

I'm not sure anymore if this issue belongs here.

kennedymwavu commented 6 months ago

webutils::parse_*() functions are exactly what I was looking for! They offer the parsing speeds you'd expect. Leaving this POC here for posterity:

#' Handle POST on '/upload'
#'
#' @param req Request object.
#' @param res Response object.
#' @export
file_upload <- \(req, res) {
  content_type <- req$CONTENT_TYPE
  body <- req$rook.input$read()

  postdata <- webutils::parse_http(body, content_type)

  # print the parsed data:
  utils::str(postdata)

  # write the uploaded file locally (POC):
  file_path <- file.path(
    getwd(),
    basename(postdata[["file"]][["filename"]])
  )

  writeBin(object = postdata$file$value, con = file_path)

  html <- tags$p(
    class = "text-success",
    "File uploaded successfully!"
  )
  res$send(html)
}
JohnCoene commented 6 months ago

Maybe I should switch that in ambiorix and/or document this.

Thanks for looking into this!

kennedymwavu commented 6 months ago

Ideally, having both options would be great. Though IMO documenting carries more weight since it will demonstrate that one can use a different parser (or even build one) that suits their needs.

Welcome! I enjoyed every bit of the search.