Open dmarx opened 8 years ago
Good approach maybe to compare common words in news corpus headlines against expected word distribution in background corpus (maybe wikipedia). Would need to track all unlemmatized/unstemmed variants.
This should be learned from user feedback. Reference issues #22 and #27
Good approach maybe to compare common words in news corpus headlines against expected word distribution in background corpus (maybe wikipedia). Would need to track all unlemmatized/unstemmed variants.