fmarten / JoSimText

A system for word sense induction and disambiguation based on JoBimText approach
0 stars 0 forks source link

For trigrams, make possible to turn off lowercasing #6

Open alexanderpanchenko opened 7 years ago

alexanderpanchenko commented 7 years ago

Motivation

Currently the text is always lower cased, which (i) do not correspond to the "original" JBT implementation, (ii) will cause misunderstanding from the side of Chris as he is often prefer to keep the original case:

https://github.com/uhh-lt/josimtext/blob/master/src/main/scala/de/uhh/lt/jst/dt/Text2TrigramTermContext.scala#L35

Implementation

Make an additional command line boolean parameter which would be control use of lowercasing. Set this parameter by default to true (lowercase everything as now).