Closed mattkallo closed 6 years ago
Debugging specific datasets is a bit outside of the scope of what we can help with here. I'd recommend comparing against a normal bag-of-words classifier, e.g. from scikit-learn, to check how spaCy's classifier is comparing to others. More training data might be helpful too.
If you haven't tried Prodigy yet, it has a utility called textcat.train-curve
which checks the accuracy on 80%, 50%, 25% etc of the training data. This helps you project how your accuracy might look at 120%, 150% etc of your current dataset, so you can guess how much data to collect.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
This is most probably an issue with my training process/data. However I am not able to sort it out after spending few days on it. Need your help/inputs
I am trying to train a text categorizer to identify stock market related news titles and facing some issues with prediction of unseen data. Its a binary classifier (2 classes- stock market related or not related). My training set is roughly 400 stock market news titles and 600+ non-stock market related titles.
Problems I have noticed -
Questions -
Thanks for any feedback/input on this.
Your Environment
Info about spaCy