Result field - propagation

JohnCoene commented 2 days ago

Sharing some worries I have regarding the performances again, apologies.

A quick recap: we're going to have to handle 100+ stacks, probably far more than that. What is set is that each stack represents an output from the {composer} package by @karmatarap and @MikeJohnPage via {blockr.composer}.

One thing we don't have yet but will do soon (next iteration/sprint) is the concept of "globals" (we may need a better term) where each output/stack depends on one parent stack: this parent stack will allow having one place to store all the "global" data (data shared by many outputs, e.g.: footnotes).

Now to go back 1-2 weeks, in the app I originally rendered all stacks on load on a single tab (rendered hidden with d-none class from Bootstrap which apply display: none !important), then have a select input to toggle the visibility of the outputs.

However, this performed very poorly as all the tacks take quite some time to fire up with our examples of 11 stacks it took ~4-5 seconds for all of the blockr stacks to render and return control to the session.

So I refactored that entirely. Instead of rendering all stacks on load we render on demand as the user selects the stack from the dropdown and that fixed the issue.

However, this is with our current version of the outputs: these don't have the "globals"/parent stack, and I'm questioning whether this will work when we have that because all these updates are reactive that happen only in the shiny server, therefore all stacks server have to be running for the update to the parent stack to propagate.

The simplest example I could come up with is below, it's a bit much for a reprex but it simulates quite well what we do/will have in the application.

The first stack in the app is the "parent stack" on which all other stacks depend, select a number of stacks in the select input and see them being rendered, then change the dataset in the parent stack at the top and hit serialise: you will see that the result of the stacks have only been updated for the rendered stacks.

library(blockr)
library(shiny)

# our result block
new_results2_block <- function(value = character(), ...) {
  fields <- list(
    stack = new_result_field(value = iris)
  )

  new_block(
    fields = fields,
    expr = quote(.(stack)),
    ...,
    class = c("result_block", "data_block")
  )
}

# our parent stack that all stacks depend on
data_stack <- new_stack(
  data = new_dataset_block(selected = "iris"),
  name = "data-source",
  title = "Parent stack"
)

# simulate 10 stacks
N <- 1:10

# we create the stacks, these come from out database
# where they are individually serialised.
stacks <- N |>
  lapply(\(index) {
    new_stack(
      data = new_results2_block(value = "data-source"),
      title = paste("Stack", index),
      name = paste0("stack", index)
    )
  })

names(stacks) <- paste0("stack", N)

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  p("All stacks depend on the parent stack below"),
  p("Select a few inputs"),
  p("Do that for a few outputs, then change dataset in parent stack and serialise"),
  generate_ui(data_stack),
  hr(),
  div(
    class = "d-flex",
    div(
      class = "flex-grow-1",
      selectizeInput(
        "select",
        "select a stack",
        choices = paste0("stack", N),
        multiple = TRUE
      ),
    ),
    div(
      class = "flex-shrink-1",
      actionButton("render", "Render")
    ),
    div(
      class = "flex-shrink-1",
      actionButton("serialise", "Serialise")
    )
  ),
  div(id = "stacks")
)

server <- function(input, output, session) {
  stacks |>
    lapply(\(stack) {
      name <- attr(stack, "name")
      blockr::add_workspace_stack(name, stack)
    })

  generate_server(data_stack)

  rendered <- c()
  observeEvent(input$select, {
    sapply(input$select, \(stack_name) {
      # stack is already rendered, we return early
      if (stack_name %in% rendered) return()

      rendered <<- c(rendered, stack_name)

      stack <- stacks[[stack_name]]

      on.exit({
        generate_server(stack)
      })

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })

  observeEvent(input$serialise, {
    x <- get_workspace() |>
      ls() |>
      lapply(\(stack_name) {
        stack <- get(stack_name, envir = get_workspace())
        result <- attr(stack, "result")

        is_still_iris_dataset <- identical(result, iris)
        cat(
          "stack",
          stack_name,
          "is identical to iris dataset?",
          is_still_iris_dataset,
          "\n"
        )
      })
  })
}

shinyApp(ui, server)

In brief, either 1) I am missing something here and there's no issue, or 2) we need an efficient way to render hundreds of stacks, or 3) we need a way to propagate changes from parent stack in a different manner, 4) there is another solution.

JohnCoene commented 2 days ago

To complete this, here's an example rendering all stacks on load.

library(shiny)
library(blockr)

stacks <- lapply(1:100, \(i) {
  new_stack(
    data = new_dataset_block(selected = "iris"),
    selected = new_select_block(columns = sample(names(iris), 1))
  )
})

ui <- fluidPage(
  theme = bslib::bs_theme(5L),
  div(id = "stacks")
)

server <- function(input, output, session) {
  observe({
    lapply(stacks, \(stack) {
      on.exit({
        generate_server(stack)
      })

      print(stack)

      insertUI(
        "#stacks",
        "beforeEnd",
        generate_ui(stack)
      )
    })
  })
}

shinyApp(ui, server)

DivadNojnarg commented 2 days ago

@JohnCoene this issue is expected at this scale. If you call 100 times a module server function that takes even only 0.5 sec to complete on the main thread, there are performance issues. I don't imagine when it's deployed on a server with multiple users.

It would make sense to start thinking about using ExtendedTask or some other async solution (coupled with mirai?) at the block level, so computations in 1 block don't freeze other independent elements. But this won't solve issues with coupled elements, that is, if you update a block and there are 25 blocks downstream, you'll have to wait to get the result.

We briefly talked about it with @nbenn.

DivadNojnarg commented 2 days ago

@JohnCoene @nbenn Little example with a fast stack on the left and slow on the right using ExtendedTask and mirai. They are running async so I can still do things on the first stack when the second one is running.

https://github.com/user-attachments/assets/c5aac3b8-267a-4a1c-9f52-809aa874c17b

BristolMyersSquibb / blockr

Result field - propagation #429