dginev / nnexus

Auto-linking for Mathematical Concepts for PlanetMath.org, Wikipedia, and beyond.
MIT License
18 stars 3 forks source link

Remove articles from Index #35

Closed dginev closed 11 years ago

dginev commented 11 years ago

Ray raised my attention to indexing concept phrases that start with an article (namely "a", "an" and "the"). Currently, they get added to the index as-is, but will always get skipped over while linking, since articles are part of our stopword list.

The reasonable solution is to strip away articles from concepts, as part of the normalize routine in NNexus::Morphology.

dginev commented 11 years ago

Done, committed and pushed.