Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.06k forks source link

New Alerting lets user select fields with fielddata disabled #7510

Open kmerz opened 4 years ago

kmerz commented 4 years ago

Expected Behavior

The user should not have the possibility to select the field message or other fields from type text with fieldata disabled. Since that will only raise a query exception and will prevent the event definition from working.

Current Behavior

A user can select message as a field for aggregation (group by or metric) and the event definition is doomed to fail, since it will only throw an exeception.

This can lead to 1000s of log messages in elastic search and graylog.

Steps to Reproduce (for bugs)

  1. Create a event definition
  2. select message field in a aggregation (card message)
  3. Take a look into your server.log for errors like:
    
    2020-02-20 14:08:16,278 ERROR: org.graylog.events.processor.aggregation.PivotAggregationSearch - Aggregation search query <query-1> returned an error: Unable to perform search query: 

Fielddata is disabled on text fields by default. Set fielddata=true on [message] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead. ElasticsearchException{message=Unable to perform search query:

Fielddata is disabled on text fields by default. Set fielddata=true on [message] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead., errorDetails=[Fielddata is disabled on text fields by default. Set fielddata=true on [message] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]} at org.graylog.plugins.views.search.elasticsearch.ElasticsearchBackend.checkForFailedShards(ElasticsearchBackend.java:326) at org.graylog.plugins.views.search.elasticsearch.ElasticsearchBackend.doRun(ElasticsearchBackend.java:285) at org.graylog.plugins.views.search.elasticsearch.ElasticsearchBackend.doRun(ElasticsearchBackend.java:82) at org.graylog.plugins.views.search.engine.QueryBackend.run(QueryBackend.java:86) at org.graylog.plugins.views.search.engine.QueryEngine.prepareAndRun(QueryEngine.java:155) at org.graylog.plugins.views.search.engine.QueryEngine.lambda$execute$6(QueryEngine.java:95) at java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1604) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)



## Context
[z#843142]

## Your Environment
* Graylog Version: 3.2.0
tecbird commented 4 years ago

+1

barzog commented 2 years ago

I would argue against it. In ES documentation we see exactly following:

Use the text field type if:

The content is human-readable, such as an email body or product description.
You plan to search the field for individual words or phrases, such as the
brown fox jumped, using [full text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html). Elasticsearch [analyzes](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html) text fields to return the most relevant results for these queries.
Use a keyword family field type if:

The content is machine-generated, such as a log message or HTTP request information.
You plan to search the field for exact full values, such as org.foo.bar, or partial character sequences, such as org.foo.*, using [term-level queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html).

GrayLog by its name is intended for machine-generated log mesages. Why message/full_message is type=text then? Can it be configurable at least?

btw: To get count of similar message it seems that two options right now exists: add in pipeline new field containing hashsum (crc/murmur) of message field and agregate on that new field use custom index mapping with ES multi-field functionality. Not tested those yet.