Re-classifying messages

henrikjohansen commented 10 years ago

In an environment where stuff changes a lot it could be very beneficial to have the ability to re-classify messages.

One very real example is adding devices that already have existing messages in ES to a stream. Another could be the merger of 2 separate streams. A third possibility is the ability to re-run messages through an recently added extractor.

I feel that this is a necessity for providing credibility and robustness to stream based analyses, reporting and alerting (since streams are such a vital anchor point for the way Graylog2 works).

The notion of long running system jobs already exists in Graylog2 - this would make a nice addon.

I am aware of complications, limitations, impacts, bla bla, because of the way ES works but this is rather essential because coherency and correctness are an absolute must for critical environments.

A fair compromise could be to expose this as an opt-in feature that requires clicking through 50 "I know I am stupid but I really want to do this anyways and nobody else but me is to blame" buttons and not doing this automagically upon events such as adding or modifying extractors).

henrikjohansen commented 10 years ago

Forgot to say - this could very well be search based (ie. do operation xyz upon messages that match this query) because different requirements such as timeframe,etc need to be taken into consideration.

This feature should make use of ES versioning support in order to protect the original messages ...

LordMike commented 10 years ago

This is very relevant in new installations too, where a single guys is attempting to classify/parse all messages that come in. He/she will never keep up, so re-running the dataset to correctly process everything is sometimes a must :).

:+1:

pierluc-codes commented 8 years ago

Just want to point out that this is still relevant today.

mryanb commented 8 years ago

Agreed with @plcstpierre This is an advantage of Splunk, we're able to send data there and reclassify/index it down the road it we need to pull some more useful info down the road. It also allows us to ship the data there and worry about the reporting when its necessary.

mriedmann commented 8 years ago

Totally in need for this feature! In the meantime: Is there some alternative way to "rerun" old messages through new extractors?

joschi commented 8 years ago

@mriedmann You could read the messages from Elasticsearch with Logstash and its "elasticsearch" input and send them again to Graylog with the Logstash "gelf" output.

But be aware that this will essentially duplicate messages unless you remove the old ones from Elasticsearch.