duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
120 stars 25 forks source link

Construction of deep relational trees #101

Open krlmlr opened 6 months ago

krlmlr commented 6 months ago

This is a toy example, but relevant for some CRAN packages with the default setting of max_expression_depth . The symptoms are the same as when evaluating rel7 .

Ideally, we would already see an error when constructing rel5 . However, the system lets me construct rel5 and even rel6, only construction of rel7 fails with the same error as the evaluation of rel5 . Is this an off-by-two error, or something more serious?

duckplyr can fall back to dplyr if the error happens at construction, but not at evaluation -- this is too late. An error on construction of rel5 or perhaps even rel4 would fix the downstream problem. How to achieve this?

duckdb <- asNamespace("duckdb")
con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
df1 <- tibble::tibble(id = 1L)

DBI::dbExecute(con, "SET max_expression_depth TO 5")
#> [1] 0

rel1 <- duckdb$rel_from_df(con, df1, experimental = experimental)
rel2 <- duckdb$rel_project(
  rel1,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
rel3 <- duckdb$rel_project(
  rel2,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
rel4 <- duckdb$rel_project(
  rel3,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
rel4
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [id as id]
#>   Projection [id as id]
#>     Projection [id as id]
#>       r_dataframe_scan(0x11cca4278)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - id (INTEGER)
rel5 <- duckdb$rel_project(
  rel4,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
rel5
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [id as id]
#>   Projection [id as id]
#>     Projection [id as id]
#>       Projection [id as id]
#>         r_dataframe_scan(0x11cca4278)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - id (INTEGER)
rel6 <- duckdb$rel_project(
  rel5,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
rel6
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [id as id]
#>   Projection [id as id]
#>     Projection [id as id]
#>       Projection [id as id]
#>         Projection [id as id]
#>           r_dataframe_scan(0x11cca4278)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - id (INTEGER)
rel7 <- duckdb$rel_project(
  rel6,
  list({
    tmp_expr <- duckdb$expr_reference("id")
    duckdb$expr_set_alias(tmp_expr, "id")
    tmp_expr
  })
)
#> Error: {"exception_type":"Binder","exception_message":"Max expression depth limit of 5 exceeded. Use \"SET max_expression_depth TO x\" to increase the maximum expression depth."}
rel7
#> Error in eval(expr, envir, enclos): object 'rel7' not found
duckdb$rel_to_altrep(rel6)
#> Error: Error evaluating duckdb query: Parser Error: Maximum tree depth of 5 exceeded in logical planner
duckdb$rel_to_altrep(rel5)
#> Error: Error evaluating duckdb query: Parser Error: Maximum tree depth of 5 exceeded in logical planner
duckdb$rel_to_altrep(rel4)
#>   id
#> 1  1

Created on 2024-03-10 with reprex v2.1.0

krlmlr commented 6 months ago

See https://github.com/duckdblabs/duckplyr/commit/ffa7e96ac50db7a4d3d0d7f73ef0930337af97df for my workaround.

krlmlr commented 6 months ago

The necessary margin seems to be larger than 2, even larger than 10. This helped with at least one reverse dependency, we'll see.