Closed heriistantoo closed 2 years ago
Thanks for reaching out. Below is my response.
1) Genetic algorithm tries to find a set of word tokens that give the best performance. While doing so it does 5 fold cross-validation to assess model stability. The genetic algorithm core module tries hundreds of different combinations. So you can multiply the time taken for training a single model multiplied by 5 and multiplied by a few hundred. In short, it is a time-consuming process.
2) The below example is not suitable for a small example of a few records. But a real-world dataset.
doc_list = ['i had dinner','i am on vacation','I am happy','Wastage of time']
label_list = ['Neutral','Neutral','Positive','Negative']
If you can build your own logistic regression model with a TF-IDF vector, then consider feeding it to the module.
As a side remark, feature selection does not yield desirable results when done in isolation. To get the best possible results, try ensembling different models and features. While doing so, perform feature selection. Please check the third module TextFeatureSelectionEnsemble
. It does exactly that.
Thanks!
Hello, I just read your paper on feature selection with genetic algorithms and am interested in trying the code. But when I try the code below:
I find the code never finishes computing, has been waiting for hours and it doesn't finish. Can you please explain why this happened? and can you help me choose a parameters that can produce output quickly, I'm curious what kind of output the algorithm gives.
Thank you.