Closed chainsawriot closed 1 year ago
Additional information to this #248 and also this question on Stackoverflow. Sometimes ...
must be evaluated in the correct call stack aka context.
This is one more reason rio
should not mess with ...
.
Another option, which is less breaking, is just to remap the following three
No less, no more.
It also means header
should be an official argument.
All of these three remapping must be documented. Also, we need to make sure all underlying functions with these three parameters must get remapped also. And both remapped and non-remapped executions should work, i.e.
import("whatever.ods", which = "Sheet1") # work
import("whatever.ods", sheet = "Sheet1") # work
And then all ...
is just ...
.
Even without a concensus on this, the remapping should be documented. Maybe as a vignette.
My opinion on this should probably not matter too much but I really like the remapping approach for a fixed set of important parameters and let ...
be ...
. I always feel like ...
should be something for people who "know what they are doing" and not control potential standard behavior.
Although breaking changes are acceptable for a 1.0, I don’t think we should remove the remapping and cause those breaking changes to code. However, we should just pass …
without remapping. IMO, it is within scope of rio
to smooth out common differences between various ways to read data. That’s kind of the point. So standardizing some arguments such as header
, skip
, file
makes sense to me for serving the core purpose— I don’t actually have to know the underlying functions used to read or write this data very well, because I have one consistent interface to it. …
is the escape valve so that there’s little or no reason to stop using rio
if you start there.
@schochastics @jsonbecker Thank you very much for the comments. I compiled the following table. There are good news (other than I did not get insane after compiling this). If we are going to target only file
, which
and header
(let's call this "the base convention"), we covered almost all of the them except the read.DIF
odd case and fst
(the parameter columns
functions differently).
format | function | file | which | header |
---|---|---|---|---|
csv, dat, txt, csv, csvy, psv, clipboard | data.table::fread |
file | which | header |
fwf | readr::read_fwf |
file | - | 1 |
fwf | utils::read.table |
file | - | header |
r | dget |
file | - | - |
dump | source |
file | 2 | - |
rds | readRDS |
file | 3 | - |
rdata, rda | load |
file | 2 | - |
feather | feather::read_feather |
path | 3 | - |
fst | fst::read.fst |
path | 3 | 4 |
matlab | rmatio::read.mat |
filename | 3 | - |
dta (Stata) | haven::read_dta |
file | - | - |
dta (Stata) | foreign::read.dta |
file | - | - |
dbf (xbase) | foreign::read.dbf |
file | - | - |
dif | utils::read.DIF |
file | - | 5 |
sav, zsav (SPSS) | haven::read_sav |
file | - | - |
sav | foreign::read.spss |
file | - | - |
spss | haven::read_por |
file | - | - |
sas7bdat | haven::read_sas |
file | - | - |
xpt | haven::read_xpt |
file | - | - |
mtp | foreign::read.mtp |
file | - | - |
syd | foreign::read.systat |
file | - | - |
json | jsonlite::fromJSON |
txt | - | - |
rec (Epiinfo) | foreign::read.epiinfo |
file | - | - |
arff | foreign::read.arff |
file | - | - |
xls | readxl::read_xls |
path | sheet | col_names |
xlsx | readxl::read_xlsx |
path | sheet | col_names |
openxlsx::read.xlsx |
sheet | |||
fortran (see read.fwf) | utils::read.fortran |
file | - | - |
ods | readODS::read_ods |
path | sheet | col_names |
xml | xml2::read_xml |
x | 3 | - |
html | xml2::read_xml |
x | 2 | - |
yaml | yaml::read_yaml |
file | - | - |
eviews | hexView::readEViews |
file | - | - |
pzfx | pzfx::read_pzfx |
path | table | - |
parquet | arrow::read_parquet |
file | - | - |
col_names
has different meaningswhich
to select object from the multi-item object, e.g. envionment, tables in an html filewhich
in the method, but doesn't get passed. The underlying function does not have this conceptcolumns
Actually, we only need to deploy arg_reconcile or a simplified version of it in the following cases: xlsx
, xls
, ods
, and pzfx
.
I wanted to go through arg_reconcile for #334 but I will leave it for fixing it with this issue
So many issues related to ...
, arg_reconcile and friends.
Short-term solution
which
a standard argument for all import methods.tempdtazip <- tempfile(fileext = ".dta.zip")
rio::export(mtcars, tempdtazip, format = "dta")
rio::import(tempdtazip)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
rio::import_list(tempdtazip) ## the warning message, hello arg_concile
#> Warning in arg_reconcile(haven::read_dta, file = file, ..., .docall = TRUE, : The following arguments were ignored for haven::read_dta:
#> which
#> $file1fe5255d64f8fb.dta
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Created on 2023-09-11 with reprex v2.0.2
This is a meta discussion about the
...
behavior forimport()
andexport()
. It is related to the following issues312
280
231
221
206
132
I closed them all so that we can focus the discussion here.
The
...
and the undocumented remapping behaviors have been a major source of confusion in the past; and I think we need to rectify this in the upcoming v1.0.0, the major release where the API should be considered "stable" and probably the only chance in the near future to introduce breaking changes.As I argue in #280 , in my opinion passing unused arguments to the underlying reading/writing function should be an error, rather than a warning. But I really want to know how you all (users and developers) think about this.
Also, we also have two choices regarding the undocumented remapping behaviors (mostly for xlsx, ods, and csv): (1) document them all in v1.0.0; or (2) abandon them all and let
...
be...
that gets passed to the underlying function.My initial think was to abandon them all to reduce the complexity of
rio
. But I don't have a grip on how many of you wrote code in the past like this:And certainly, I don't want to break your existing code. If you do have code that was written like that, please let me know. It would be taken into account when I think about the question of these undocumented remapping behaviors.