Closed kgpCBS closed 1 year ago
By default, the get_ngrams function that is called within fakerake does not return 4-letter words because of stemming which leads to search specificity issues. You would have to manually modify that to retrieve 4-letter words.
Thanks for looking into to it. :) maybe you could considered making this constraint an options with 4 being a strong recommendation.
It maybe scare people away if it is not transparent why four letter terms are excluded.
Thanks for great work :)
Hi
First off all thank you for you tremendous work :)
I have been testing out your workflow on a small test set with a naive_search = "Motivational factors in dark tourism".
When I run the extract_terms function in only returns very few potential keywords and "dark tourism" is not one of them.
I have checked that "dark" is not a part of the stopwords.
I implemented a rake function with similar input as your function and this returns a lot more potential keywords including "dark tourism"
I have a hard time understanding why you function does not return more word and specifically why it is does not include "dark tourism"
I hope you have help clarify this concern.
I have attached a .zip with .Rmd and .RData files
Kind regards
Kristoffer
bug.zip