Open dmuensterer opened 1 year ago
@dmuensterer We discussed the request and decided not to add a feature for this specific request.
The message
does not always contain the full message. For example, when using the beats collectors, the message
field is only a small part of the complete message.
If you need the information, you can use pipeline rules to compute the size of message fields using the length()
pipeline function. I hope that helps!
Thank you for considering and the explanation. Would you be able to take it into consideration to always calculate the byte size of the ingested message, without regards to any fields?
@dmuensterer I will bring it up for discussion again.
For some supported input data, it can be problematic to get the byte size of an individual message. NetFlow, for example might send multiple messages in one packet. Same for pull-based inputs that fetch messages from cloud APIs.
Thanks. Two considerations from my side to maybe help with the issue:
Calculate the size of the message just before graylog would start with extractors/pipelines, to ensure that what’s being calculated belongs to a single message.
or (maybe simpler)
Provide more advanced metrics to be able to associate inputs with data size. E.g. show ingested data sizes for Input A from January 1st 00:00 to January 3rd 00:00.
In contrast to
gl2_accounted_message_size
which stores the number of bytes of all fields stored in Elasticsearch, there should be a field, storing the number of bytes of solely the message field.We as a security services provider charge our customers for the size of logs ingested, and we currently have huge problems in using Graylog because we have no metrics available to see how much data the customer ingested into Graylog.
Current behaviour
No metrics to see the raw data size of a log
Expected behaviour
For each log, calculate a field
gl2_ingested_message_size
that contains the size of themessage
field in bytes.