gesistsa / rio

🐟 A Swiss-Army Knife for Data I/O
http://gesistsa.github.io/rio/
600 stars 76 forks source link

Standard for import methods #132

Closed chainsawriot closed 1 year ago

chainsawriot commented 8 years ago

I think it will be a good idea to specify a standard for import methods in the CONTRIBUTING.md. For standard I mean argument names such as path, which and header because there are two "standards" for these arguments:

  1. base functions use file, which and header
  2. readxl uses path, sheet and col_names

By reading the code, rio is using file, which and header but I think it will be great to be explicit. As far as I know, some packages (such as googlesheets and readODS) are trying to emulate the interface of readxl.

leeper commented 8 years ago

Good points. Need to consider how to standardize this.

leeper commented 7 years ago

I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like:

Attribute Stata SPSS SAS
Variable name ... ... ...
Variable description ... ... ...
Variable class/type ... ... ...
Value labels ... ... ...
bokov commented 5 years ago

I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like: Attribute Stata SPSS SAS Variable name ... ... ... Variable description ... ... ... Variable class/type ... ... ... Value labels ... ... ...

Could the crosswalk spreadsheet I posted in #228 be germane to this?

bokov commented 3 years ago

Proposed standard for import methods:

When writing new methods, where practical, the following format is recommended:

.import.rio_SUFFIX <- function(file, ARG1=VAL1, ARG2=VAL2, ...) {
    requireNamespace("PARENT_PACKAGE")
    arg_reconcile(PARENT_PACKAGE::READER_FUNCTION, FILEARG = file, ARG1=VAL1, ARG2=VAL2,..., .docall=TRUE)
}

In the above template, SUFFIX is replaced by the file extension for that format (.xlsx, .csv, .dta, etc.). You can optionally define any number of additional default arguments the reader function recognizes (represented by ARG1, ARG2, etc.) and their respective default values (VAL1, VAL2, etc.). The ellipsis ... is required in your function definition and should be passed to arg_reconcile to avoid errors when the same code is used to import files with different formats. The .docall=TRUE should always be used unless you intend to capture the normalized argument list and further process it before passing it to the reader function. PARENT_PACKAGE is the package that provides the reader function (e.g. for read_xlsx it would be "readxl" and so PARENT_PACKAGE::READER_FUNCTION would be readxl::read_xls). FILEARG is the name of the argument that the reader function uses to represent the file (e.g. for read_xlsx it's path so the argument mapping would begin with path = file). For more details, please see ? rio:::arg_reconcile

bokov commented 3 years ago

@leeper , should I submit a PR on CONTRIBUTING.md?

leeper commented 3 years ago

@bokov Please do!