WinVector / wrapr

Wrap R for Sweet R Code
https://winvector.github.io/wrapr/
Other
135 stars 11 forks source link

unexpected behaviour: qc() removes leading zeroes #15

Closed emilBeBri closed 2 years ago

emilBeBri commented 2 years ago

Hi,

qc(), which is a favorite of mine and an integral part of my workflow, produces one unexpected result - given that the name of the function is "quoted concenate":

# produces '0' as expected
wrapr::qc(0)
# produces '1' as expected
wrapr::qc(1)
# produces '1' - not as expected
wrapr::qc(01)

The background is I have some identification numbers that are numbers-only, but some contain leading zeroes. I can't use qc() to select them :)

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8    
 [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8   
 [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] wrapr_2.0.8

loaded via a namespace (and not attached):
[1] compiler_4.1.1
JohnMount commented 2 years ago

Thank you very much for the issue report. And also an apology, as you ran into this while working it likely caused trouble.

I may be wrong, but I suspect it can not be fixed using base-R methods. Perhaps rlang can do it, but I do not know. I suspect the issue is R parses the arguments before invoking the function qc(). The arguments are passed in unevaluated, but they are parsed and not in source code form. Some examples around this include:

Notice the following errors out even thought the argument is never used (whereas the traditional example f(stop()) does not).

f <- function(...) {}
f(0z1)
# Error: unexpected symbol in "f(0z1")

Now we look at the traditional capture methods, substitute() and match.call().

f <- function(...) {substitute(...)}
f(01)
# [1] 1
f <- function(...) {match.call()}
f(01)
# f(1)

In both cases the argument is converted from free text 01 to the integer 1 before we can get to the arguments to capture their input form.

I am documenting this issue in the qc() man page here: https://winvector.github.io/wrapr/reference/qc.html.

If you think I am wrong or have any potential solutions, please do re-open the issue.

emilBeBri commented 2 years ago

No need for an apology, I'm just happy that it's available - and I was just eyeballing data when it happened, so no damage done.

I see your point and agree. I'm not a rlang user, base-R/data.table is more my thing. So I wouldn't know if there is a solution there - I might investigate it, but then again it's such an edgecase even for most of my uses that learning rlang just for that seems like a bit much (I tried a couple of different google searches this morning, but it's hard to find something useful, because if it's there, it is lost in the sea of people talking about how to add leading zeroes in general - I think you have to be quite profficient in rlang to know wether it's possible in that framework or not )

JohnMount commented 2 years ago

My guess is researching rlang would not help, as it is likely to run into the same issues (arguments being parsed before the function is called, even in lazy eval). The main hope would be: is there a source code reference available for the arguments? I'm going to ask some of the R-core experts I know.

JohnMount commented 2 years ago

Emil, thank you for your time and positive comments. I think I now have something fairly close to your use case. I would love for you to try it out. What I have done is added a new function called sx() that doesn't attempt type conversion away from strings. What made be think of this is that wrapr already has a function called bc() (also inspired by interacting with you!) that was a little closer to your use case than qc() is. The difference is bc() and sx() expect their argument to be a single string- so you put a single set of quotes around everything.

With the dev version (2.0.9, not on cran; can install with remotes::install_github("WinVector/wrapr")) we have the following:

wrapr::sx('01 02 03')
[1] "01" "02" "03"

If you have the time I would love your feedback. I find I use qc() and bc() a lot in my own work. Often I then neaten the code by replacing them with their output (so I use them more than people tend to know). sx() may in fact turn out to be a simpler and more common use case than bc(). They now cross-link all of their help pages (they did not do that before).

emilBeBri commented 2 years ago

Great, thank you for making that! I will test it out and will reply with feedback if I have any.

Incidentally, I'm the one who suggested the bc() function a while ago - I'm using it alot as well, and am happy you had the technical provess to make it (and also found good use for it yourself)