elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
1.01k stars 418 forks source link

Support multilingual (geo) name fields #1738

Open kaisecheng opened 2 years ago

kaisecheng commented 2 years ago

Summary

add locale support for geo.*name to allow application query in different language

Motivation:

Elasticsearch and Logstash use Maxmind database to enrich events with geodata. The database supports eight languages. It is useful to allow users to choose which languages to use when processing the city name, continent name, country name and region name.

Having multilingual name fields in Elasticsearch is convenient for later search in Kibana or other applications

Detailed Design:

For backward compatibility, geo.city_name, geo.continent_name, geo.country_name, geo.name and geo.region_name remain the same type and structure.

A suggestion is to add extra fields with locale as key. Here is an example for geo.country_name

{
  "geo" :  {
       "country_name" : "United States",
       "country_name_with_locales" :  {
             "en_US" : "United States",
             "de_DE" : "USA",
             "ru_RU" : "США",
             "pt_BR" : "EUA",
             "fr_FR" : "États Unis",
             "es_ES" : "EE. UU.",
             "ja_JP" : "アメリカ",
             "zh-CN" : "美国",
       }    
  }
}
ebeahan commented 2 years ago

Thanks for the detailed issue, @kaisecheng.

Any insight into if users typically want to:

I understand the use case and how users could find ECS support helpful. Introducing extra fields for each locale would add up fast, but maybe we consider using flattened instead of explicitly defining each field?

kaisecheng commented 2 years ago

Have all available localized names for a location?

No. Most of the time, users pick the languages that they are interested in.

Entirely replace the US English location data with a different location? Keep US English location and add one?

Good question. Users want to subscribe the language they are interested in, eg. [ja_JP, zh_CN], so US English is not useful in their market. From the feedback so far, they prefer replacing English with a different location. From my perspective, keeping the ECS structure in a form that most users are benefited, let users decide how to use the data. I am hesitated to say users just want a single default language, as the requirement of the users are varied depending of their size. When they focus on multi regions, they may find a default language is not good enough.

consider using flattened instead of explicitly defining each field?

I can see the benefit of a flatten name list, but believe a json map like structure would be a user peference for faster easier lookup.