awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

read.xlsx with skipEmptyCols=FALSE always returns all columns starting at 1 #399

Open kainhofer opened 6 years ago

kainhofer commented 6 years ago

Expected Behavior

When reading a certain region (either a named region or via the rows and cols argument of read.xlsx) with skipEmptyCols=FALSE, I expect the read.xlsx function to return exactly the requested columns, even if they are empty.

read.xlsx("2018-05-08_openxlsx_named_region_bug.xlsx", rows = 3, cols = 3:8, skipEmptyCols = FALSE, colNames = FALSE) expected output: X3 X4 X5 X6 X7 X8 1 C3 D3 E3 F3 G3 H3

Actual Behavior

read.xlsx always returns all columns starting at 1, even if the cols argument does not include the first few columns.

read.xlsx("2018-05-08_openxlsx_named_region_bug.xlsx", rows = 3, cols = 3:8, skipEmptyCols = FALSE, colNames = FALSE) actual output: X1 X2 X3 X4 X5 X6 X7 X8 1 NA NA C3 D3 E3 F3 G3 H3

The first two entries of the result are never requested and should not be there.

BTW, the skipEmptyCols has to be FALSE for my actual use, because there are some empty colums (within the requested region) that I need to preserve.

Steps to Reproduce the Problem (please attach an example xlsx file if possible)

In the attached file 2018-05-08_openxlsx_named_region_bug.xlsx, there is a named region "Test" for cells C3:H3.

  1. read in the named region "Test" via: read.xlsx("2018-05-08_openxlsx_named_region_bug.xlsx", namedRegion = "Test", skipEmptyCols = FALSE, colNames = FALSE)

  2. Alternatively, read in columsns 3-8 of row 3; read.xlsx("2018-05-08_openxlsx_named_region_bug.xlsx", rows = 3, cols = 3:8, skipEmptyCols = FALSE, colNames = FALSE) X1 X2 X3 X4 X5 X6 X7 X8 1 NA NA C3 D3 E3 F3 G3 H3

sessionInfo()

kainhofer commented 6 years ago

BTW, the problem is inside the C++ function read_workbook, as the following call already returns the first two columns erroneously:

read_workbook( cols_in = 3:8, rows_in = rep(3, 6), v = c("C3", "D3", "E3", "F3", "G3", "H3"), string_inds = 1:6, is_date = NA, hasColNames = FALSE, skipEmptyRows = TRUE, skipEmptyCols = FALSE, nRows = 1, clean_names = clean_names)