elastic / ems-file-service

Data sources for Elastic Map Service
Other
3 stars 8 forks source link

Remove region identifiers with values that are too common #242

Closed jsanz closed 2 years ago

jsanz commented 2 years ago

Related with elastic/kibana#125985

Our vector manifests ship with region identifiers to help consumers to match potential datasets that could be joined with our boundaries.

We are seeing a root for false positives that could be fixed at the EMS File Service side related with identifier values that are too common.

For example, France departments have the ISO field that is very distinctive, but then INSEE is just a number (for almost all values). This makes any consumer testing a field with numbers in that domain to match against this field easily.

image

In order to remove this source for false positives, a solution could be to completely avoid writing those values in our manifest by adding an optional property in the field mappings to opt them out.

fieldMapping: [
...
{
    type: id
    name: insee
    desc: INSEE department identifier
    skipValues: true  <= new optional value
    alias: [
    insee
    ]
}
...
]

Then in the build process those marked fields would not contribute values to the corresponding manifest.

I'm not sure if we should create new versioned source files with versions: >= 8.2, or we can introduce this change on the existing manifests.

thomasneirynck commented 2 years ago

Closed with https://github.com/elastic/ems-file-service/pull/243