duttashi / learnr

Exploratory, Inferential and Predictive data analysis. Feel free to show your :heart: by giving a star :star:
MIT License
78 stars 55 forks source link

Extracting multiple variables from multiple dataframes? #50

Closed duttashi closed 5 years ago

duttashi commented 5 years ago

This question was originally asked on SO

Question: Suppose there are n dataframes (in this case 3). How to extract variables which appear in all n dataframes?

Dataset

df1 <- structure(list(Variable = c("a", "g", "e"), Val = c(0.9, 0.3, 
0.1)), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(Variable = c("h", "a", "e"), Val = c(0.2, 0.7, 
0.9)), class = "data.frame", row.names = c(NA, -3L))

df3 <- structure(list(Variable = c("z", "a", "e"), Val = c(0.5, 0.7, 
0.9)), class = "data.frame", row.names = c(NA, -3L))
duttashi commented 5 years ago

Solution: 1

library(tidverse)
mget(paste0("df", 1:3)) %>%
      map(~ .x %>%
               pull(Variable)) %>%
      reduce(intersect)

Solution: 2

Bind the datasets into a single one, grouped by 'Variable', filter the number of unique groups equal to 3 and extract the 'Variable'

bind_rows(df1, df2, df3, .id = 'grp') %>%
     group_by(Variable) %>%
     filter(n_distinct(grp) == 3) %>%
     distinct(Variable) %>% 
     pull(Variable)