elizagrames / litsearchr

litsearchr is an R package to partially automate search term selection for systematic reviews using keyword co-occurrence networks. In addition to identifying search terms, it can write Boolean searches and translate them into over 50 languages.
https://elizagrames.github.io/litsearchr
103 stars 26 forks source link

remove_duplicates - error #30

Closed evezeyl closed 4 years ago

evezeyl commented 4 years ago

Hi, I am exploring your package, and following your example: https://elizagrames.github.io/litsearchr/litsearchr_vignette_v030.html with my data.

I encountered a problem when using litsearchr::remove_duplicates(import_search, "title", "exact")

so I slighlyt tested and modified the function (I am not as advanced in R as you are) but this seemsed to work as intended:

remove_duplicates2 <- function (df, field, method = c("stringdist", "fuzzdist", "exact")) {
    dups <- synthesisr::find_duplicates(df[,field], match_function = method, 
                                        to_lower = TRUE, rm_punctuation = TRUE)

    df <- synthesisr::extract_unique_references(df, matches = dups)
    return(df)
}

#called as: 
remove_duplicates2(import_search, "title", "stringdist")

I actually got also more direct result with :

synthesisr::deduplicate(import_search, match_by = "title", match_function = "stringdist", to_lower = TRUE, rm_punctuation = TRUE)

but you might intend to complexify your function.

Anyhow, I just wanted to let you know that at least something was not working when I tried it with my data. And thank you so much to developp this package, it will be really helpfull. All the best Eve

elizagrames commented 4 years ago

Yeah, we slightly reworked the dedupe functions in synthesisr and I forgot to update it in litsearchr to reflect the new variable names. Your workaround is exactly what the fix is, which is now changed in the current master branch of litsearchr.