eht16 / python-logstash-async

Python logging handler for sending log events asynchronously to Logstash.
MIT License
186 stars 51 forks source link

LogstashFormatter: Move field_skip_set creation to __init__ #97

Closed feliixx closed 7 months ago

feliixx commented 7 months ago

Commit baf2118 introduced a new set field_skip_set to store the fields to
skip, unfortunately this field is created too early, causing modification
of FORMATTER_RECORD_FIELD_SKIP_LIST by library users to be ignored.
Let's create the field at the end of init instead to fix the regression.

Fixes #96

eht16 commented 7 months ago

Thanks for raising the issue and fixing it.

Just for the records: it was possible if the "constants" module have been imported before the "formatter" module, e.g.:

from logstash_async.constants import constants
constants.FORMATTER_RECORD_FIELD_SKIP_LIST = \
    constants.FORMATTER_RECORD_FIELD_SKIP_LIST + [
        'custom_data', 'custom_message', 'func_name', 'interpreter',
        'interpreter_version', 'line', 'logsource',
        'logstash_async_version', 'pid', 'process_name', 'program',
        'thread_name']

from logstash_async.handler import AsynchronousLogstashHandler
from logstash_async.formatter import LogstashFormatter

But this might be a bit cumbersome and most probably most linters won't like this.

eht16 commented 7 months ago

Since the FORMATTER_LOGSTASH_ECS_MESSAGE_FIELD_LIST setting is affected in the same way, we should move it too.

Should be like:

diff --git a/logstash_async/formatter.py b/logstash_async/formatter.py
index cc9cf09..e57c84a 100644
--- a/logstash_async/formatter.py
+++ b/logstash_async/formatter.py
@@ -27,9 +27,6 @@ class LogstashFormatter(logging.Formatter):

     _basic_data_types = (type(None), bool, str, int, float)

-    field_skip_set = set(constants.FORMATTER_RECORD_FIELD_SKIP_LIST)
-    top_level_field_set = set(constants.FORMATTER_LOGSTASH_MESSAGE_FIELD_LIST)
-
     class MessageSchema:
         TIMESTAMP = '@timestamp'
         VERSION = '@version'
@@ -89,6 +86,9 @@ class LogstashFormatter(logging.Formatter):
         self._prefetch_logsource()
         self._prefetch_program_name()

+        self.field_skip_set = set(constants.FORMATTER_RECORD_FIELD_SKIP_LIST)
+        self.top_level_field_set = set(constants.FORMATTER_LOGSTASH_MESSAGE_FIELD_LIST)
+
     # ----------------------------------------------------------------------
     def _prefetch_interpreter(self):
         """Override when needed"""
@@ -270,10 +270,13 @@ class LogstashEcsFormatter(LogstashFormatter):
     }

     normalize_ecs_message = constants.FORMATTER_LOGSTASH_ECS_NORMALIZE_MESSAGE
-    top_level_field_set = {*constants.FORMATTER_LOGSTASH_ECS_MESSAGE_FIELD_LIST,
-                           *__schema_dict.values()}
     MessageSchema = type('MessageSchema', (LogstashFormatter.MessageSchema,), __schema_dict)

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.top_level_field_set = {*constants.FORMATTER_LOGSTASH_ECS_MESSAGE_FIELD_LIST,
+                                    *self.__schema_dict.values()}
+
     def _get_primary_fields(self, record):
         message = super()._get_primary_fields(record)
         Schema = self.MessageSchema
@@ -407,13 +410,16 @@ class DjangoLogstashEcsFormatter(DjangoLogstashFormatter, LogstashEcsFormatter):
         'REQ_REFERER': 'http.request.referrer',
     }

-    top_level_field_set = LogstashEcsFormatter.top_level_field_set | set(__schema_dict.values())
     MessageSchema = type(
         'MessageSchema',
         (DjangoLogstashFormatter.MessageSchema, LogstashEcsFormatter.MessageSchema),
         __schema_dict,
     )

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.top_level_field_set = self.top_level_field_set | set(self.__schema_dict.values())
+
     def _remove_excluded_fields(self, message):
         message.pop('status_code', None)
         super()._remove_excluded_fields(message)
@@ -498,13 +504,16 @@ class FlaskLogstashEcsFormatter(FlaskLogstashFormatter, LogstashEcsFormatter):
         'REQ_ID': 'http.request.id',
     }

-    top_level_field_set = LogstashEcsFormatter.top_level_field_set | set(__schema_dict.values())
     MessageSchema = type(
         'MessageSchema',
         (FlaskLogstashFormatter.MessageSchema, LogstashEcsFormatter.MessageSchema),
         __schema_dict,
     )

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.top_level_field_set = self.top_level_field_set | set(self.__schema_dict.values())
+
     def _remove_excluded_fields(self, message):
         message.pop('status_code', None)
         super()._remove_excluded_fields(message)

@feliixx do you want to add this to this PR? If not, I'll add it later.

@andriilahuta since you are using the ECS formatters, would you mind giving it a try?

feliixx commented 7 months ago

Hi @eht16 thanks for your quick feedback !

Smart idea to fix the issue by changing the import order, but it does feel a bit odd.

Since the FORMATTER_LOGSTASH_ECS_MESSAGE_FIELD_LIST setting is affected in the same way, we should move it too.

Agreed, I've thought of doing it in https://github.com/eht16/python-logstash-async/pull/97 but as you noticed it's not as trivial, and I'm not familiar with the ECS formatter so I choose not to do it.

@feliixx do you want to add this to this PR?

Thanks, but I'd rather not to as I'm not familiar with this part, so I leave it to you :)

eht16 commented 7 months ago

Smart idea to fix the issue by changing the import order, but it does feel a bit odd.

Yes, even though technically correct, it's not so nice.

@feliixx do you want to add this to this PR?

Thanks, but I'd rather not to as I'm not familiar with this part, so I leave it to you :)

Alright, perfectly fine.

andriilahuta commented 7 months ago

My original intention for this was simpler subclassing, something like this:

class Fmt(LogstashFormatter):
    field_skip_set = {*LogstashFormatter.field_skip_set, ...}

But it shouldn't be too difficult to do this in constructor as well.

While I don't currently have time to properly test this, I don't expect any issues with this PR's approach.