Closed alvarezmelissa87 closed 2 years ago
Pinging @elastic/ml-ui (:ml)
Example of ML job that is using categorization:
No reference to locations, but results in Anomaly Explorer show a map of France:
Same thing for cisco & juniper routers high count of records No geolocation fields in my records and yet I have locations on France map I have attached a set of documents for testing
Anomaly detection job
{ "job_id": "cj_high_low_log_count_job", "job_type": "anomaly_detector", "job_version": "7.17.0", "create_time": 1644422414565, "model_snapshot_id": "1645128082", "groups": [ "cisco_juniper" ], "description": "", "analysis_config": { "bucket_span": "15m", "detectors": [ { "detector_description": "count partitionfield=\"device.brand\"", "function": "count", "partition_field_name": "device.brand", "detector_index": 0 } ], "influencers": [ "device.brand", "device.name" ] }, "analysis_limits": { "model_memory_limit": "20mb", "categorization_examples_limit": 4 }, "data_description": { "time_field": "@timestamp", "time_format": "epoch_ms" }, .... "datafeed_config": { "datafeed_id": "datafeed-cj_high_low_log_count_job", "job_id": "cj_high_low_log_count_job", "query_delay": "82024ms", "chunking_config": { "mode": "auto" }, "indices_options": { "expand_wildcards": [ "open" ], "ignore_unavailable": false, "allow_no_indices": true, "ignore_throttled": true }, "query": { "bool": { "must": [ { "match_all": {} } ], "filter": [ { "match_phrase": { "event.dataset": "cisco_juniper_logs" } } ], "must_not": [] } }, "indices": [ "filebeat-*" ],....
Anomaly Explorer Maps
This is because of a false-positive match in the MapsPlugin#suggestEMSTermJoin
function.
The function receives the sampleValues
of mlcategory
(which are numbers within the 0-50), and matches them to the INSEE code of France departments. These are also numbers within the 0-50 range. See https://maps.elastic.co/#file/france_departments
There may be complimentary ways on how to address this:
MapsPlugin#suggestEMSTermJoin
to be more strict. For example, require also the field-name to match the alias regex.
"Show results as France Departments on map"
, after which the map would expand.MapsPlugin#suggestEMSTermJoin
should omit auto-matching on fields when we know the values are too unspecified
FR-30
has little chance of confusion, while some the simple INSEE number 30
does). Thoughts, @nickpeihl @jsanz @nreese (?)
It'd be nice to strike a more perfect balance, to get Maps to display ASAP, but also to avoid false positive. Not just to fix this issue, but also since the MapsPlugin#suggestEMSTermJoin
is expected to support choropleth-mapping in Lens as well. (fwiw - since the Map will be a "suggested chart", there will already by an explicit "confirm" step from the user, similar to (b)
To my opinion, this suggestion feature should be switched off because errors will be always possible. In another dataset, i had french "code postal" field but the maps pointed locations in Austin, Dallas
Pinging @elastic/kibana-gis (Team:Geo)
Hiya @thomasneirynck, @nreese - I think option (c) might be a good solution going forward. Is this something that can be addressed for 8.3? For now, I can prevent this from showing up in the ML plugin by just not showing the map when it's a categorization job.
@alvarezmelissa87 I believe option (c) is already addressed here: EMS will now omit metadata for fields that have too generic values: https://github.com/elastic/ems-file-service/pull/243
This change should "automatically" show up in 8.2, without any required changes on your end.
Ah! Thank you! Is this something I can test locally to confirm we don't see the issue anymore? Then I'll be able to close this issue off.
You can run Kibana on 8.2,8.x or master branch, and they should have the fix.
With maps work in looks like this is no longer reproducible 🎉
Describe the bug: The anomaly explorer shows choropleth map of regions when the ML job has no reference to location information - no geo graphical influencers or anything.
Expected behavior: Map should be shown only when job config contains geo info as partition field or influencer.
Screenshots (if relevant):