Stefan4472 / simple-search-engine

GNU General Public License v3.0
0 stars 0 forks source link

Better handling of case-sensitivity #13

Closed Stefan4472 closed 2 years ago

Stefan4472 commented 2 years ago

Look into this. I think queries should be case-insensitive (i.e., store all tokens as lowercase!)

Stefan4472 commented 2 years ago

Added this option in the AlphanumericTokenizer. I'm not sure if this is really something we'd want to keep long term, because from some quick research, it may be ineffective: https://nlp.stanford.edu/IR-book/html/htmledition/capitalizationcase-folding-1.html