gesistsa / rio

🐟 A Swiss-Army Knife for Data I/O
http://gesistsa.github.io/rio/
594 stars 77 forks source link

Export errors unnecessarily if passed x is a list of length one. #385

Closed cha-petersumm closed 8 months ago

cha-petersumm commented 8 months ago

It would be neater if the export function worked when passed a dataframe list with length = 1, regardless of the export format.

An error would still occur if length > 1 and the export format didn't support it.

This would avoid the need for this sort of code:

if (length(data_list) > 1 | info$export_function == "writexl::write_xlsx") export(data_list, output_filename) else export(data_list[[1]], output_filename)

cha-petersumm commented 8 months ago

I haven't tested this, but I think it can be done in export.R by replacing

if (!is.data.frame(x) && !format %in% c("xlsx", "html", "rdata", "rds", "json", "qs", "fods", "ods")) { stop("'x' is not a data.frame or matrix", call. = FALSE) }

with

if (!is.data.frame(x) && !format %in% c("xlsx", "html", "rdata", "rds", "json", "qs", "fods", "ods")) { if (is.list(x) && (length(x) == 1) && is.data.frame(x[[1]])) x <- x[[1]] else stop("'x' is not a data.frame, matrix or list of exactly one data.frame", call. = FALSE) }

chainsawriot commented 8 months ago

@cha-petersumm Thank you for reporting and suggesting a solution.

There will be an update. But I will put

if (is.list(x) && (length(x) == 1) && is.data.frame(x[[1]])) {
    x <- x[[1]]
}

in an outer loop.

cha-petersumm commented 8 months ago

No problem.

You want to avoid doing this for file formats where it's not necessary. In particular, for Excel files, exporting a list preserves the sheet names whereas converting a list of length = 1 to the first element and then exporting will lose them.

chainsawriot commented 8 months ago

Please install the latest version

install.packages("rio", repos = "https://gesistsa.r-universe.dev")

And there are tests for preserving sheet names when length(x) == 1.

cha-petersumm commented 8 months ago

Great work! Thanks.

cha-petersumm commented 8 months ago

Also, in case it's ever useful to you, this is my example of where it was used. It's written to de-identify files containing patient data:

cat(--- title: "deidentify patient data" author: "Peter Summers" date: "r Sys.Date()" output: html_document

knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
library(rio)
library(tools)
library(openssl)

Deidentify the chosen files.


password <- "Put a project-specific password here."

input_filename_list <- choose.files()

for (input_filename in input_filename_list) {

  output_filename <- paste0(file_path_sans_ext(input_filename)," - deidentified.", file_ext(input_filename))

  info <- get_info(input_filename)

  data_list <- import_list(input_filename, format=info$format)

  # Process multiple data frames if there is more than one, e.g. from a multi-sheet Excel file.
  for (dataset_number in 1:length(data_list)) {
    for (column_name in c("MRN","PAT_ID","CSN","LOG_ID","ORDER_ID")) {
      if (!is.null(data_list[[dataset_number]][[column_name]])) 
        data_list[[dataset_number]][[column_name]] <- md5(as.character(data_list[[dataset_number]][[column_name]]), key=password)
    }
  }

# Uncomment the three lines below if not using a version of rio with fix #385.
  # if (length(data_list) > 1 | info$export_function == "writexl::write_xlsx")
    export(data_list, output_filename, format=info$format)
  # else
  #   export(data_list[[1]], output_filename, format=info$format)
}