Closed erikschlegel closed 6 years ago
@erikschlegel Just to clarify, you'd expect keywords for which we have a translation to be stored as the English keyword? Specifically, in the example above, you'd want all instances of ataque
to be replaced with attack
?
Resolving as we're now tracking this elsewhere.
It looks like we're writing a mix of both english and spanish terms to cassandra. For example, if
ataque
is a watchlist term for a Fortis site, where the primary language is spanish with english translation support. Ifataque
is mentioned in a spanish tweet we archive that term in cassandra. We do the same ifattack
is mentioned in an english tweet. A data sample is listed below. This presents a problem in the Fortis interface as the services expect content to be aggregated based on terms in the base language. We need to enhance the keyword extraction analyzer to properly normalize this whereattack
is detected asataque
.