Tazinho / snakecase

🐍🐍🐍 A systematic approach to parse strings and automate the conversion to snake_case, UpperCamelCase or any other case.
https://tazinho.github.io/snakecase/
GNU General Public License v3.0
147 stars 9 forks source link

Can't escape regex special character in transliterations #187

Closed aornugent closed 4 years ago

aornugent commented 4 years ago

Hia, love the package.

I can't quite get the transliterations to do what I want:

library(tidyverse)
library(snakecase)

x <- tibble(`Off farm contracts ($)` = NA)

rename_all(x, str_replace, pattern = "$", replacement = "AUD") %>%
  colnames(.)
#> [1] "Off farm contracts ($)AUD"

rename_all(x, str_replace, pattern = "\\$", replacement = "AUD") %>%
  colnames(.)
#> [1] "Off farm contracts (AUD)"

Escaping the $ with \\$ gives me the correct behaviour. Now I'd like to leverage snakecase to also format the remaining text:

rename_all(x, to_any_case, transliterations = c("$" = "AUD")) %>%
  colnames(.)
#> [1] "offaud_farmaud_contractsaud"

rename_all(x, to_any_case, transliterations = c("\\$" = "AUD")) %>%
  colnames(.)
#> [1] "off_farm_contracts"

Is there something special needed to treat the $ character as a transliteration?

aornugent commented 4 years ago

Ah! The brackets need to be specified in sep_in:

rename_all(x, to_any_case, sep_in = "\\(|\\)", transliterations = c("\\$" = "AUD")) %>%
  colnames(.)

#> [1] "off_farm_contracts_aud"
Tazinho commented 4 years ago

Hi @aornugent you could also use

library(tidyverse)
library(snakecase)

x <- tibble(`Off farm contracts ($)` = NA)

rename_all(x, to_any_case, sep_in = "[^[:alnum:]|\\$]", 
           transliterations = c("\\$" = "AUD")) %>%
  colnames(.)
#> [1] "off_farm_contracts_aud"

Created on 2020-03-23 by the reprex package (v0.3.0)

The default (sep_in = "[^[:alnum:]") treats non-alphanumeric characters like "$" as a separator. Therefore, you can't match it with transliterations without changing sep_in.

Your solution sep_in = "\\(|\\)" treats "(" and ")" (and " ", "_", which are always added internaly) as separators. Therefore, "$" is kept in the name and you can match it via transliterations.

sep_in = "[^[:alnum:]|\\$]" as shown above would treat every non-alphanumeric character except $ as a separator. I believe this is a bit more robust as other special characters like ".", ",", "-" etc. can not appear in the output.