In order not to make thing more complicated (such as introducing new parameters for such an edge case), my suggestion is simply to make some precedence rules.
zip_file <- tempfile(fileext = ".xlsx.zip")
rio::export(head(iris), zip_file)
raw_file <- utils::unzip(zip_file, list = TRUE)$Name[1]
rio::import(zip_file)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
## this is fine-ish, I guess?
rio::import(zip_file, which = "aaaa.xlsx")
#> Warning in extract_func(file, files = file_list[grep(which2, file_list)[1]], :
#> requested file not found in the zip file
#> Error: `path` does not exist: '/tmp/RtmpH9K6ta/file831fb50f53589/aaaa.xlsx'
rio::import(zip_file, which = raw_file)
#> Error: Sheet 'file831fb5a3e85e.xlsx' not found
## a more illustrative example
zip_file2 <- tempfile(fileext = ".xlsx.zip")
rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), zip_file2)
xlsx_file <- tempfile(fileext = ".xlsx")
rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), xlsx_file)
rio::import(zip_file2, which = 1)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
rio::import(zip_file2, which = 2)
#> Warning in extract_func(file, files = file_list[which], exdir = d): requested
#> file not found in the zip file
#> Error: 'file' has no extension
rio::import(xlsx_file, which = 1)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
rio::import(xlsx_file, which = 2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 6.7 3.3 5.7 2.5 virginica
#> 2 6.7 3.0 5.2 2.3 virginica
#> 3 6.3 2.5 5.0 1.9 virginica
#> 4 6.5 3.0 5.2 2.0 virginica
#> 5 6.2 3.4 5.4 2.3 virginica
#> 6 5.9 3.0 5.1 1.8 virginica
Of course, one can argue why anyone would use compressed formats with multiple sheets in the first place, e.g.
xlsx.zip
. But a bug is a bug.The issue is that the
which
parameter ofimport()
is used twice: first for selecting a file in the archive, and second for selecting a sheet.https://github.com/gesistsa/rio/blob/c86db70174bb9da81b7c4b6ee3f22dd9cbdb1c1e/R/import.R#L131
https://github.com/gesistsa/rio/blob/c86db70174bb9da81b7c4b6ee3f22dd9cbdb1c1e/R/import.R#L156
In order not to make thing more complicated (such as introducing new parameters for such an edge case), my suggestion is simply to make some precedence rules.
Created on 2024-05-14 with reprex v2.1.0