JanMarvin / openxlsx2

openxlsx2 - read, write and modify xlsx files
https://janmarvin.github.io/openxlsx2/
Other
121 stars 11 forks source link

[Feature Request] Extract targets of hyperlinks stored in workbook relationships #1135

Closed hongyuanjia closed 1 month ago

hongyuanjia commented 1 month ago

The current openxlsx2::wb_load() already parses hyperlinks and stores them in wbWorkbook$worksheets[[i]]$hyperlinks. Instead of showing the strings stored in the cells, I wonder if it is possible to provide a parameter in wb_to_df() to extract hyperlink targets and use them as the values of the corresponding cells? Thanks!

hongyuanjia commented 1 month ago

FYI, I implemented a function to do the trick using inlineStr:

read_excel <- function(file, sheet, show_hyperlinks = TRUE, ...) {
    # load workbook
    wb <- openxlsx2::wb_load(file, sheet, data_only = FALSE)
    sheet <- wb$validate_sheet(sheet)

    hl <- wb$worksheets[[sheet]]$hyperlinks

    # in case of no hyperlinks, directly read the data
    if (!show_hyperlinks || is.null(hl)) {
        return(openxlsx2::wb_to_df(wb, sheet, ...))
    }

    # get the hyperlink refs and targets
    hl_ref <- vapply(hl, .subset2, character(1), "ref")
    hl_tar <- vapply(hl, .subset2, character(1), "target")

    # get the raw cell data
    cc <- wb$worksheets[[sheet]]$sheet_data$cc

    # match the cells
    m <- match(hl_ref, cc$r)

    # in case of invalid refs?
    if (anyNA(m)) {
        hl_ref <- hl_ref[!is.na(m)]
        hl_tar <- hl_tar[!is.na(m)]
        m <- m[!is.na(m)]
    }

    if (length(m)) {
        # change types to inline string
        cc$c_t[m] <- "inlineStr"
        # remove all other values
        cc$v[m] <- ""
        cc$f[m] <- ""
        cc$is[m] <- paste0("<is><t>", hl_tar, "</t></is>")
        wb$worksheets[[sheet]]$sheet_data$cc <- cc
    }
    openxlsx2::wb_to_df(wb, sheet, ...)
}
JanMarvin commented 1 month ago

Hi @hongyuanjia , thanks for the suggestion. I've added a PR in #1136 that should provide what you want. This returns spreadsheet hyperlinks location or targets when using show_hyperlinks = TRUE. For our own hyperlinks openxlsx2 uses a workaround and adds formulas =hyperlink() instead of using native hyperlinks. Changing this is on my to-do list for quite some time ...

Could you give it a try if this solves your issue?

remotes::install_github("JanMarvin/openxlsx2#1136")
hongyuanjia commented 1 month ago

@JanMarvin I have tested it, and it works like a charm. Thanks!

JanMarvin commented 1 month ago

Thanks!