Closed chainsawriot closed 1 year ago
Good points. Need to consider how to standardize this.
I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like:
Attribute | Stata | SPSS | SAS |
---|---|---|---|
Variable name | ... | ... | ... |
Variable description | ... | ... | ... |
Variable class/type | ... | ... | ... |
Value labels | ... | ... | ... |
I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like: Attribute Stata SPSS SAS Variable name ... ... ... Variable description ... ... ... Variable class/type ... ... ... Value labels ... ... ...
Could the crosswalk spreadsheet I posted in #228 be germane to this?
Proposed standard for import methods:
When writing new methods, where practical, the following format is recommended:
.import.rio_SUFFIX <- function(file, ARG1=VAL1, ARG2=VAL2, ...) {
requireNamespace("PARENT_PACKAGE")
arg_reconcile(PARENT_PACKAGE::READER_FUNCTION, FILEARG = file, ARG1=VAL1, ARG2=VAL2,..., .docall=TRUE)
}
In the above template,
SUFFIX
is replaced by the file extension for that format (.xlsx, .csv, .dta, etc.). You can optionally define any number of additional default arguments the reader function recognizes (represented by ARG1, ARG2, etc.) and their respective default values (VAL1, VAL2, etc.). The ellipsis...
is required in your function definition and should be passed toarg_reconcile
to avoid errors when the same code is used to import files with different formats. The.docall=TRUE
should always be used unless you intend to capture the normalized argument list and further process it before passing it to the reader function. PARENT_PACKAGE is the package that provides the reader function (e.g. forread_xlsx
it would be"readxl"
and so PARENT_PACKAGE::READER_FUNCTION would bereadxl::read_xls
). FILEARG is the name of the argument that the reader function uses to represent the file (e.g. forread_xlsx
it'spath
so the argument mapping would begin withpath = file
). For more details, please see? rio:::arg_reconcile
@leeper , should I submit a PR on CONTRIBUTING.md
?
@bokov Please do!
I think it will be a good idea to specify a standard for
import
methods in the CONTRIBUTING.md. For standard I mean argument names such aspath
,which
andheader
because there are two "standards" for these arguments:file
,which
andheader
readxl
usespath
,sheet
andcol_names
By reading the code,
rio
is usingfile
,which
andheader
but I think it will be great to be explicit. As far as I know, some packages (such asgooglesheets
andreadODS
) are trying to emulate the interface ofreadxl
.