convert_to_numeric() in a dataset of 500,000+ rows took 2.5 minutes

running cleanepi::convert_to_numeric() in a dataset of 500,000+ rows took 2.5 minutes

wondering if this may be an expected scenario to happen and if this may require refactoring at an appropriate time to use data.table or dtplyr.

library(rio)
library(cleanepi)
library(tidyverse)
library(tictoc)

covid <- rio::import(
  "https://raw.githubusercontent.com/Joskerus/Enlaces-provisionales/main/data_limpieza.zip",
  which = "datos_covid_LA.RDS"
) %>% 
  cleanepi::standardize_column_names()

tictoc::tic()
covid %>% 
  dplyr::select(numero_de_hospitalizaciones_recientes) %>% 
  cleanepi::convert_to_numeric(
    target_columns = "numero_de_hospitalizaciones_recientes",
    lang = "es")
#> # A tibble: 502,010 × 1
#>    numero_de_hospitalizaciones_recientes
#>                                    <dbl>
#>  1                                     0
#>  2                                     0
#>  3                                     0
#>  4                                     0
#>  5                                     0
#>  6                                     0
#>  7                                     0
#>  8                                     0
#>  9                                    NA
#> 10                                     0
#> # ℹ 502,000 more rows
#> # ℹ Use `print(n = ...)` to see more rows
tictoc::toc()
#> 150.42 sec elapsed

cc: @Joskerus @lgbermeo

epiverse-trace / cleanepi

convert_to_numeric() in a dataset of 500,000+ rows took 2.5 minutes #163