dgrtwo / fuzzyjoin

Join tables together on inexact matching
Other
664 stars 62 forks source link

regex_semi_join doesn't pass back in joined variables #42

Closed azadag closed 5 years ago

azadag commented 6 years ago

the semi-join appears to work, but doesn't add joined variables Working from an example I had saved from twitter a while back.

library(tidyverse)
library(fuzzyjoin)
regexes <- read_csv(
"regex, type
Windows, Windows-ish
Red Hat|CentOS|Fedora, Fedora-ish
Ubuntu|Debian,Debian-ish
CoreOS|Amazon,Amazon-ish")

os <- read_lines("https://rud.is/dl/os.txt", na = "NA")
os <- as.data.frame(os, na.rm = TRUE)
os <- os[complete.cases(os),]
os <- as.data.frame(os, na.rm = TRUE)

os_left <- os %>% 
replace_na(list(regex = "Unknown")) %>% 
  regex_left_join(regexes, c(os = "regex")) 

os_semi <- os %>% 
replace_na(list(regex = "Unknown")) %>% 
  regex_left_join(regexes, c(os = "regex")) 

os2 returns a dataframe with 3 variables os3 returns one variable (factor)

azadag commented 6 years ago

Sorry. This appears to have been a dplyr issue.

dgrtwo commented 5 years ago

Late notice, but this was indeed a bug in how fuzzyjoin handled one-column data.frames! Likely took a while to notice because most users use tbl_dfs.

Now fixed, thanks a lot for this report!

azadag commented 5 years ago

!!! great!