awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

Characters replaced by HTML codes when read from calculated cell #393

Open debarros opened 6 years ago

debarros commented 6 years ago

Expected Behavior

Reading the same text from a cell with that text stored as a value and from a cell where the text is the result of a calculation should produce the same value (the text itself).

Actual Behavior

If the text contains &, >, or <, those characters get replaced by &amp;, &lt;, and &gt;, respectively when read from a cell that involves a formula.

Steps to Reproduce the Problem

(please attach an example xlsx file if possible)

  1. Read the first two sheets in the attached workbook testingCharacters.xlsx

  2. Each of the 8 columns (4 in the first sheet and 4 in the second) should contain the exact same values. However, for rows 38, 60, and 62 in the imported data.frame, the value in the first column of the first sheet is different from all of the others.

  3. This code shows the issue:

    sh1 = read.xlsx(xlsxFile = file.choose(), sheet = 1) # select the attached file
    sh2 = read.xlsx(xlsxFile = file.choose(), sheet = 2) # select the attached file
    sh3 = cbind.data.frame(sh1, sh2)                     # bind the two dataframes
    sh3$Issue = FALSE                                    # create a column to indicate issues
    for(i in 1:nrow(sh3)){                               # find the issues
    sh3$Issue[i] = !(all(sh3[i,1:8] == sh3[i,1]))
    }
    sh3[sh3$Issue,]                                      # display the issues

    sessionInfo()

    • Version of openxlsx: 4.0.17 and 4.1.0
    • Version of R: 3.4.3