elizagrames / litsearchr

litsearchr is an R package to partially automate search term selection for systematic reviews using keyword co-occurrence networks. In addition to identifying search terms, it can write Boolean searches and translate them into over 50 languages.
https://elizagrames.github.io/litsearchr
103 stars 26 forks source link

"extract_terms" function not finding the right terms. #59

Closed kgpCBS closed 1 year ago

kgpCBS commented 1 year ago

Hi

First off all thank you for you tremendous work :)

I have been testing out your workflow on a small test set with a naive_search = "Motivational factors in dark tourism".

When I run the extract_terms function in only returns very few potential keywords and "dark tourism" is not one of them.

I have checked that "dark" is not a part of the stopwords.

I implemented a rake function with similar input as your function and this returns a lot more potential keywords including "dark tourism"

I have a hard time understanding why you function does not return more word and specifically why it is does not include "dark tourism"

I hope you have help clarify this concern.

I have attached a .zip with .Rmd and .RData files

Kind regards

Kristoffer

bug.zip

elizagrames commented 1 year ago

By default, the get_ngrams function that is called within fakerake does not return 4-letter words because of stemming which leads to search specificity issues. You would have to manually modify that to retrieve 4-letter words.

kgpCBS commented 11 months ago

Thanks for looking into to it. :) maybe you could considered making this constraint an options with 4 being a strong recommendation.

It maybe scare people away if it is not transparent why four letter terms are excluded.

Thanks for great work :)