ddotta / tablexlsx

R package to export data frames from R to xlsx workbook
https://ddotta.github.io/tablexlsx/
Other
16 stars 1 forks source link

List of dataframes bound to a name produces empty worksheet #37

Open teorems opened 5 months ago

teorems commented 5 months ago

Kudos for this package. Handy idea, mostly for the automatic sheet creation as it avoids to use something like map or lapply and a custom function with various openxlsx function calls ...

However I encounter this issue :

Exporting a list() of dataframes, like list(iris, cars etc.) is fine.

  library(tablexlsx)
#this works
mypath <- tempdir()
list(iris, cars, mtcars) |> toxlsx(path = mypath, tosheet = list("iris"  ="iris", "cars" = "cars" , "mtcars" = "mtcars"))
#> ✔ Your Excel file 'Export.xlsx' is available in the folder 'C:\Users\***\AppData\Local\Temp\RtmpY94Zz2'
file.path(mypath, "Export.xlsx") |> fs::file_show()

But when the list is bound to a name like datasets <- list(iris, cars, mtcars) the resulting workbook is created but empty (argument tosheet = specified or not).

# this produces an empty worksheet
datasets  <- list(iris, cars, mtcars)
datasets |> toxlsx(path = mypath, tosheet = list("iris"  ="iris", "cars" = "cars" , "mtcars" = "mtcars"))
#> Warning: Workbook does not contain any worksheets. A worksheet will be added.
#> ✔ Your Excel file 'Export.xlsx' is available in the folder 'C:\Users\***\AppData\Local\Temp\RtmpY94Zz2'
# empty file
file.path(mypath, "Export.xlsx") |> fs::file_show()

Created on 2024-06-06 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.0 Patched (2024-05-26 r86634 ucrt) #> os Windows 10 x64 (build 19045) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate French_France.utf8 #> ctype French_France.utf8 #> tz Europe/Paris #> date 2024-06-06 #> pandoc 3.1.11 @ C:/***/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.3) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3) #> knitr 1.47 2024-05-29 [1] CRAN (R 4.4.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> openxlsx 4.2.5.2 2023-02-06 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.3) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.3) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.2) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0) #> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.3) #> stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0) #> styler 1.10.3 2024-04-07 [1] CRAN (R 4.3.3) #> tablexlsx * 1.0.0 2024-06-06 [1] CRAN (R 4.4.0) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.44 2024-05-15 [1] CRAN (R 4.4.0) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> zip 2.3.1 2024-01-27 [1] CRAN (R 4.3.2) #> #> [1] C:/***/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Cheers.

JulienBlasco commented 5 months ago

Hi @teorems, thank you for the feedback. Seems related to #18 and the way toxlsx handles its object argument. I will look into it.

ddotta commented 5 months ago

Thanks @JulienBlasco
However, I think there's still a problem in the use case described by @teorems

> datasets |> toxlsx(path = mypath, tosheet = list("iris"  ="iris", "cars" = "cars" , "mtcars" = "mtcars"))
Error in listsplitted[[tosheet[[df_name]]]] : 
  attempt to select less than one element in get1index
JulienBlasco commented 5 months ago

Yes indeed, @teorems' example still doesn't work. But the problem is that once the datasets object is created, it is an unnamed list that has lost all trace of the original names of the datasets. See:

datasets <- list(iris, cars, mtcars)
datasets
#> [[1]]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 1            5.1         3.5          1.4         0.2     setosa
#> 2            4.9         3.0          1.4         0.2     setosa
#> 3            4.7         3.2          1.3         0.2     setosa
#> 4            4.6         3.1          1.5         0.2     setosa
#> 5            5.0         3.6          1.4         0.2     setosa
#> 6            5.4         3.9          1.7         0.4     setosa
#> ...
#> 
#> [[2]]
#>    speed dist
#> 1      4    2
#> 2      4   10
#> 3      7    4
#> 4      7   22
#> 5      8   16
#> 6      9   10
#> ...
#> 
#> [[3]]
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> ...

Therefore, when datasets is passed to toxlsx(), it has no way of knowing that the element named "iris" in tosheet refers to the first data.frame of datasets, that its element named "cars" refers to the second data.frame of datasets, etc.

Two possible solutions are now provided by the latest version of the package (which is that of commit https://github.com/ddotta/tablexlsx/commit/89511db83a4fa158e832ec2a3e633e009ee9eab0 as I write these lines)

1. Name the elements of datasets

datasets <- list(Iris = iris, Cars = cars, Mtcars = mtcars)
datasets |> toxlsx(path = tempdir(), tosheet = list(Iris = "iris sheet", 
                                                    Cars = "cars sheet" , 
                                                    Mtcars = "mtcars sheet"))
#> ✔ Your Excel file 'Export.xlsx' is available in the folder '/tmp/RtmpgBSzXx'

2. Create the list on the fly and use calling names in other arguments

list(iris, as.data.frame(cars), head(mtcars)) |> toxlsx(
  path = tempdir(), 
  tosheet = list(iris = "iris sheet", 
                 "as.data.frame(cars)" = "cars sheet" , 
                 "head(mtcars)" = "mtcars sheet")
  )
#> ✔ Your Excel file 'Export.xlsx' is available in the folder '/tmp/RtmpjqqK5P'

I don't see how we could handle such cases better. Maybe match items by position when lists are not named? Error messages should definitely be more informative, though.

I hope this is a satisfactory answer @teorems!

ddotta commented 5 months ago

Thanks @JulienBlasco ! Indeed, both solutions work. I personally find that forcing the naming of list items is a good practice.

While I'm waiting for @teorems' answer, I'll keep this issue in mind and possibly think about documenting it in the vignette...