firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.2k stars 198 forks source link

Code Generator: R flavour (Rstats) #1582

Open aourednik opened 3 years ago

aourednik commented 3 years ago

Flavor Request

An R flavor would be especially nice, as R code, , in difference with other languages, needs an extra \ to escape special characters :

Example:

txt <- "a test of capitalizing"
gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", txt, perl=TRUE)
gsub("\\b(\\w)",    "\\U\\1",       txt, perl=TRUE)

R is popular among data scientist and has many packages for text analysis.

Another challenge in R is that no function in R has a straightforward handling for first, second etc. matched pattern ($1, $2 etc.) Array extraction result[1] result[2] is used instead.

Base fuctions are grep, grepl, regexpr, gregexpr, regexec and gregexec https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html

There are also the stringi and stringr packages.

working-name commented 2 years ago

@aourednik Are we maybe looking at a code generator vs flavor support? Since it seems R itself has perl-like regex (PCRE), and stringi/r is java-based which is also a supported feature.

Have you bumped into any unsupported regex tokens?

aourednik commented 2 years ago

@working-name Thx for reply. Yes, you are right, sorry for mislabeling. I am suggesting an R code generator.

On the flavor side, I've bumped into stringi/r not resolving correctly positive and negative lookahead and lookbehind constructs. But that might be solved by now.

ottothecow commented 2 years ago

I actually think an R flavor might be beneficial.

The key issue being that R requires double backslashes since it uses it as an escape character.

I personally never use the code generators, but when I am working in R I sometimes find it tough to get all of the backslashes right, especially if you start to do things like sometimes need a regex that looks for backslashes. E.g. say you want to use \\\d.*\\keyword in R...its not immediately obvious how many backslashes you need to add to get it to work.

Its not the end of the world, but its nice to be able to work in regex101 with the same syntax that you are coding with and be able to copy paste between them.

working-name commented 2 years ago

Hi @ottothecow sorry, I should have amended the title. You're probably better off starting a new feature request if you want your request to gain visibility.

I have not worked with R before but the purpose of the code generator is to do just that: escape the regex for you, for use in code.

As far as the site is concerned, regex input is just regex. If you try to handle all variations of escaping needs (sometimes 2-3 within the same language), it'll get even harder to follow what's going on - this is just my opinion, however.