dgrtwo / fuzzyjoin

Join tables together on inexact matching
Other
668 stars 61 forks source link

join operation incorrectly drops a dimension #17

Closed mgacc0 closed 8 years ago

mgacc0 commented 8 years ago

I was trying to regex join two vectors:

    sentences <- c("qwertyuiop", "asdfgfh", "zxcvbn")
    patterns <- c("wer", "asd")
    regex_right_join(sentences, patterns)
    # Error in UseMethod("groups") : 
    #   no applicable method for 'groups' applied to an object of class "character"

Maybe if I previously convert them to a data.frame?

    regex_right_join(data.frame(sentences),
                   data.frame(patterns),
                   by=c(sentences="patterns"))
    # Error: cannot convert object to a data frame

It seems that it's dropping a dimension (from data.frame to vector) in some place...

So, finally, I tried:

    regex_right_join(data.frame(sentences, 0),
                   data.frame(patterns, 0),
                   by=c(sentences="patterns")) %>%
    select(sentences, patterns)
    #    sentences patterns
    #1 qwertyuiop      wer
    #2    asdfgfh      asd

Could you check where is it dropping a dimension (from data.frame to vector)? And would you consider adding the functionality to join vectors?

dgrtwo commented 8 years ago

A. This was fixed in #13 but is not yet on CRAN. I'll plan on submitting to CRAN today or tomorrow since it looks like this is a common issue (#16 too). Until then please either use the GitHub version (devtools::install_github("dgrtwo/fuzzyjoin")) or use dplyr's data_frame (tbl_dfs work, just not data.frames).

B. I don't plan on supporting joining vectors since these are modeled on dplyr's _join operations, and since there are a lot of choices to make when joining vectors. (Should the output be a named vector or a data frame? If a data frame, what do you call the columns?) I'd rather encourage working with data frames the whole way through.