Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.42k stars 1.07k forks source link

Calculate and add field `gl2_ingested_message_size` to display number of bytes in `message`field #14307

Open dmuensterer opened 1 year ago

dmuensterer commented 1 year ago

In contrast to gl2_accounted_message_size which stores the number of bytes of all fields stored in Elasticsearch, there should be a field, storing the number of bytes of solely the message field.

We as a security services provider charge our customers for the size of logs ingested, and we currently have huge problems in using Graylog because we have no metrics available to see how much data the customer ingested into Graylog.

Current behaviour

No metrics to see the raw data size of a log

Expected behaviour

For each log, calculate a field gl2_ingested_message_size that contains the size of the message field in bytes.

bernd commented 1 year ago

@dmuensterer We discussed the request and decided not to add a feature for this specific request.

The message does not always contain the full message. For example, when using the beats collectors, the message field is only a small part of the complete message.

If you need the information, you can use pipeline rules to compute the size of message fields using the length() pipeline function. I hope that helps!

dmuensterer commented 1 year ago

Thank you for considering and the explanation. Would you be able to take it into consideration to always calculate the byte size of the ingested message, without regards to any fields?

bernd commented 1 year ago

@dmuensterer I will bring it up for discussion again.

For some supported input data, it can be problematic to get the byte size of an individual message. NetFlow, for example might send multiple messages in one packet. Same for pull-based inputs that fetch messages from cloud APIs.

dmuensterer commented 1 year ago

Thanks. Two considerations from my side to maybe help with the issue:

Calculate the size of the message just before graylog would start with extractors/pipelines, to ensure that what’s being calculated belongs to a single message.

or (maybe simpler)

Provide more advanced metrics to be able to associate inputs with data size. E.g. show ingested data sizes for Input A from January 1st 00:00 to January 3rd 00:00.