epiverse-trace / cleanepi

R package to clean and standardize epidemiological data
https://epiverse-trace.github.io/cleanepi/
Other
8 stars 3 forks source link

add argument `na.rm` from `janitor::remove_constant()` to `remove_constants()` #178

Closed avallecam closed 1 month ago

avallecam commented 2 months ago

Is your feature request related to a problem? Please describe. I created a situation where I need to use remove_constants() iteratively to keep variant rows and cols.

#load library
library(tidyverse)

#create dataset
df <- tibble(
  x = c(1,2),
  y = c(1,3)
) %>% 
  mutate(invariant = rep("a",nrow(.))) %>% 
  mutate(invariant2 = rep("b",nrow(.))) %>% 
  mutate(empty1 = rep(NA_Date_, nrow(.))) %>% 
  mutate(empty2 = rep(NA_Date_, nrow(.))) %>% 
  mutate(empty3 = rep(NA_Date_, nrow(.))) %>% 
  add_row(x = NA_integer_, invariant = "a") %>% 
  add_row(x = NA_integer_, invariant = "a") %>%
  add_row(x = NA_integer_, invariant = "a") %>% 
  add_row(x = NA_integer_)

df
#> # A tibble: 6 × 7
#>       x     y invariant invariant2 empty1 empty2 empty3
#>   <dbl> <dbl> <chr>     <chr>      <date> <date> <date>
#> 1     1     1 a         b          NA     NA     NA    
#> 2     2     3 a         b          NA     NA     NA    
#> 3    NA    NA a         <NA>       NA     NA     NA    
#> 4    NA    NA a         <NA>       NA     NA     NA    
#> 5    NA    NA a         <NA>       NA     NA     NA    
#> 6    NA    NA <NA>      <NA>       NA     NA     NA

df %>% 
  cleanepi::remove_constants()
#> # A tibble: 5 × 3
#>       x     y invariant2
#>   <dbl> <dbl> <chr>     
#> 1     1     1 b         
#> 2     2     3 b         
#> 3    NA    NA <NA>      
#> 4    NA    NA <NA>      
#> 5    NA    NA <NA>

df %>% 
  cleanepi::remove_constants() %>% 
  cleanepi::remove_constants()
#> # A tibble: 2 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     3

df %>% 
  janitor::remove_constant(na.rm = TRUE) %>% 
  janitor::remove_empty(which = "rows")
#> # A tibble: 2 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     3

df %>% 
  janitor::remove_empty(which = "rows") %>% 
  janitor::remove_constant(na.rm = TRUE) %>% 
  janitor::remove_empty(which = "rows")
#> # A tibble: 2 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     3

df %>% 
  janitor::remove_constant(na.rm = FALSE) %>% 
  janitor::remove_empty(which = "rows") %>% 
  janitor::remove_constant(na.rm = FALSE) %>% 
  janitor::remove_empty(which = "rows") %>% 
  janitor::remove_constant(na.rm = FALSE)
#> # A tibble: 2 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     3

packageVersion("cleanepi")
#> [1] '1.0.2'

Created on 2024-09-30 with reprex v2.1.0

Describe the solution you'd like Possibly by allowing to define options to the na.rm argument we can get an appropriate result in one line of code, if this is the expected aim of cleanepi::remove_constants()

Additional context Issue related to #177 and may provide examples to decide about #171