WinVector / wrapr

Wrap R for Sweet R Code
https://winvector.github.io/wrapr/
Other
136 stars 11 forks source link

suggestion for qc function #12

Closed emilBeBri closed 3 years ago

emilBeBri commented 3 years ago

Hi, I love the qc() function, thank you for that. I was wondering if you wanted to add another qc-like function for the following use case:

You're copy-pasting something into R, where the spaces are the markers between different elements in the vector, like so:

259 289 287

You'd like to quickly turn it into this:

vector_x <- c(259, 289, 287)

By doing something like this: vector_x <- bc(259 289 287)(

where bc() (blank concenate), is a function that understands that the spaces marks different elements.

or perhaps the best that could be done would be to put the whole vector in quotes and then let the function convert the spaces into commas and send it to c() (not sure how to approach this problem in programming)

vector_x <- bc("259 289 2872")

I'm aware that it would be bad form to put something like that into stable code, of course. But when you're working with something you're just trying out, it would actually save a lot time in the long run.

It could also be elements for a vector where something with tab or line changes denotes the different elements, but where the common form is that something like:

323                           9813                          3  
           234

should be translated into standard c() arguments like this:

vector_y <- c(323, 9813, 3, 234)

While I have made my first package a while ago, I don't have the expertise to do this, otherwise I would.

Hope you find the idea usefull, otherwise just disregard it!

JohnMount commented 3 years ago

I like the idea. I think I will definitely try a few variations and ask you for some feedback. Thank you.

Essentially the trick is R functions can look at their arguments un-evaluated but not un-parsed. So we have a lot of freedom in taking in arguments, but some constraints.

emilBeBri commented 3 years ago

thank you I'm glad you like the idea! it's the kind of low-level annoyance, copy-pasting text intended for use as elements into a script, and then having to clean it up with commas, that, if "smartened up", would clear out some friction in an exploratory process.

I'd be happy to try it out and return feedback.

I don't understand the difference between un-evaluated and un-parsed, but I just started Advanced R by mr Wickham (the first edition, eg. non-tidyverse-centered version) so perhaps some day soon I will.

JohnMount commented 3 years ago

I am experimenting with a bc() implementation right now. If you want to try it you can use the remotes package to install the development version of wrapr with the commmand remotes::install_github('WinVector/wrapr').

This works your examples, but does requires quotes.

library(wrapr)
packageVersion('wrapr')
# [1] ‘2.0.7’

bc('259 289 287')

and

bc('
323                           9813                          3  
           234
')

One hang-up is it currently defines the alphabet as [A-Za-z] which is essentially English only.

emilBeBri commented 3 years ago

That's very quick, thanks for doing that. Looking forward to checking the function's code and see how you did it. I'll play around with it in the coming week and let you know how it is going. I use Danish my self, we use æ (ae) ø (oe) and å (aa) quite a lot. Perhaps I'll be able to add those letters (and perhaps the most commonly used in german and spanish as well now that I'm at it) to the code myself and make a pull request.

JohnMount commented 3 years ago

Thanks for your ideas and help. I really appreciate it.

The code is a regular expression nightmare, but what it does is cut the input string into segments using a regular expression to define what are values and what are separators.

I am looking into a variation on the match strategy that would not need a definition of the alphabet. That may be a bit more portable. I think I have it working: bc('ø11 a') == c('ø11', 'a').

I am starting to collect tests here: https://github.com/WinVector/wrapr/blob/main/inst/tinytest/test_bc.R . If you have any examples you would like to "force to always work" I could collect them from you.

If you do run into problems: apologies, this is new code. Also please to share any bugs with me and we can impound the examples as tests going forward in addition to fixing.

emilBeBri commented 3 years ago

it's working, very cool! this is very convenient. I'll keep it in mind with the feedback and testing. I just started my PhD yesterday, so I might not be giving feedback as fast as you have been making the function, other matters than coding are more imminent, starting up this thing. But I'll be sure to respond when / if there's a reason to it. Cheers!

JohnMount commented 3 years ago

Submitted to CRAN as 2.0.7

emilBeBri commented 3 years ago

Hi again,

So, it is working great so far with bc in the admittedly not that frequent situations it's needed for me, but in those cases it's really nice to have and save alot of tediuous editing.

Here are some tests would be nice to have for bc() - I have no experience in using git, so right now I'll have to write it like this:


library(wrapr)
library(tinytest)
# test of lowercase non-english letters (Danish: æ, ø and å)
expect_equal(
    bc('person_id, geography, danish_letter_æ, danish_letter_ø, danish_letter_å'),
  c("person_id", "geography", "danish_letter_æ", "danish_letter_ø", "danish_letter_å")
)

# test of mix of upcase non-english letters (Danish: Æ, Ø and Å)
expect_equal(
    bc('person_id, geography, danish_letter_Æ, danish_letter_æ, danish_letter_Ø, danish_letter_Å'),
  c("person_id", "geography", "danish_letter_Æ", "danish_letter_æ",  "danish_letter_Ø", "danish_letter_Å")
)
JohnMount commented 3 years ago

Thank you for the test. I am happy to enter them myself.

On May 27, 2021, at 7:48 AM, Emil Bellamy Begtrup-Bright @.**@.>> wrote:

Hi again,

So, it

Here are some tests would be nice to have for bc() - I have no experience in using git, so right now I'll have to write it like this:

library(wrapr) library(tinytest)

test of lowercase non-english letters (Danish: æ, ø and å)

expect_equal( bc('person_id, geography, danish_letter_æ, danish_letter_ø, danish_letter_å'), c("person_id", "geography", "danish_letter_æ", "danish_letter_ø", "danish_letter_å") )

test of mix of upcase non-english letters (Danish: Æ, Ø and Å)

expect_equal( bc('person_id, geography, danish_letter_Æ, danish_letter_Ø, danish_letter_Å'), c("person_id", "geography", "danish_letter_Æ", "danish_letter_Ø", "danish_letter_Å") )

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/WinVector/wrapr/issues/12#issuecomment-849698019, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABATZEFOD47XC3MPR673QNLTPZLTFANCNFSM4WVDW4PQ.