giacbrd / ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
GNU Lesser General Public License v3.0
198 stars 30 forks source link

Why does the FastText performance drops when we choose all categories in 20 newsgroup classification? #21

Open falakmasir opened 7 years ago

falakmasir commented 7 years ago

I was running the document_classification_20newsgroups.py with parameters, --report --all_categories and and I experienced a huge performance drop in FastText and GensimFastText? Why NN models performance is so shaky?

giacbrd commented 7 years ago

Hi, this problem on performances seems related to the number of features used. In fact, if you try --all_categories together with --chi2_select 80 you will have proper results. These differences with the feature space are quite strange and should be investigated, moreover the original fastText performs even worse!

Could you please try to debug with different values for chi2_select in order to understand if the problem is in ShallowLearn or it is (more likely) related to the fastText algorithm in general? It would be interesting to discover some peculiar behaviour of fastText

2017-05-10 16:14 GMT+02:00 falakmasir notifications@github.com:

I was running the document_classification_20newsgroups.py with parameters, --report --all_categories and and I experienced a huge performance drop in FastText and GensimFastText? Why NN models performance is so shaky?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/giacbrd/ShallowLearn/issues/21#issuecomment-300495079, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeWcH810DQXhgfrXGAmXMfMpDr7iL5pks5r4cYqgaJpZM4NWuMr .