Why does the FastText performance drops when we choose all categories in 20 newsgroup classification?

Hi, this problem on performances seems related to the number of features used. In fact, if you try --all_categories together with --chi2_select 80 you will have proper results. These differences with the feature space are quite strange and should be investigated, moreover the original fastText performs even worse!

Could you please try to debug with different values for chi2_select in order to understand if the problem is in ShallowLearn or it is (more likely) related to the fastText algorithm in general? It would be interesting to discover some peculiar behaviour of fastText

2017-05-10 16:14 GMT+02:00 falakmasir notifications@github.com:

I was running the document_classification_20newsgroups.py with parameters, --report --all_categories and and I experienced a huge performance drop in FastText and GensimFastText? Why NN models performance is so shaky?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/giacbrd/ShallowLearn/issues/21#issuecomment-300495079, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeWcH810DQXhgfrXGAmXMfMpDr7iL5pks5r4cYqgaJpZM4NWuMr .

giacbrd / ShallowLearn

Why does the FastText performance drops when we choose all categories in 20 newsgroup classification? #21