apache / pulsar-client-python

Apache Pulsar Python client library
https://pulsar.apache.org/
Apache License 2.0
49 stars 38 forks source link

Custom logger doesn't work with created resources (like producers) #194

Closed Samreay closed 6 months ago

Samreay commented 6 months ago

There is an option to pass in a Logger when creating the pulsar client to log to. It seems this does not work when creating resources.

Reproduction:

First, run a pulsar standalone instance:

docker run -it -p 6650:6650 -p 8080:8080 --tmpfs /pulsar/data apachepulsar/pulsar:3.1.0 bin/pulsar standalone

And then run this code:

import logging

import pulsar

pulsar_logger = logging.getLogger("pulsarDummyLogger")
pulsar_logger.setLevel(logging.ERROR)
logging.basicConfig(level=logging.INFO, format="{message}", style="{")

logging.info("This should be logged")
pulsar_logger.info("This isn't logged (correctly)")

client = pulsar.Client("pulsar://localhost:6650", logger=pulsar_logger)
producer = client.create_producer("A")
producer.close()
client.close()

It will write this to your terminal:

This should be logged
2024-01-15 14:12:08.947 DEBUG [140254733069184] ProducerImpl:128 | [persistent://public/default/A, standalone-0-4] ~ProducerImpl
2024-01-15 14:12:08.948 INFO  [140254733069184] ProducerImpl:722 | Producer - [persistent://public/default/A, standalone-0-4] , [batching  = off]

Potentially related to https://github.com/apache/pulsar/pull/7713

I note that, if you set the log level to INFO for the pulsar logger, you do get more output:

This should be logged
This isn't logged (correctly)
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
Created connection for pulsar://localhost:6650
[127.0.0.1:53354 -> 127.0.0.1:6650] Connected to broker
[persistent://public/default/A, ] Getting connection from pool
[persistent://public/default/A, ] Created producer on broker [127.0.0.1:53354 -> 127.0.0.1:6650] 
[persistent://public/default/A, standalone-0-6] Closing producer for topic 0x7f4478002510
[persistent://public/default/A, standalone-0-6] Closed producer 0
Closing Pulsar client with 0 producers and 0 consumers
[127.0.0.1:53354 -> 127.0.0.1:6650] Connection disconnected
[127.0.0.1:53354 -> 127.0.0.1:6650] Destroyed connection
2024-01-15 14:13:51.273 DEBUG [139932214320000] ProducerImpl:128 | [persistent://public/default/A, standalone-0-6] ~ProducerImpl
2024-01-15 14:13:51.273 INFO  [139932214320000] ProducerImpl:722 | Producer - [persistent://public/default/A, standalone-0-6] , [batching  = off]

But I would expect that the logger is still used when creating resources like producers, readers, and consumers.

Expected behaviour

We can control the formatting and level of the logs produced by the pulsar client. Hopefully this is an easy fix, and the custom logger in the parent client can simply be passed on, if set, to the producer/consumer/readers when they are created

merlimat commented 6 months ago

@Samreay If you just want to set the log level, a possible workaround is to pass the FileLogger: https://pulsar.apache.org/api/python/3.4.x/pulsar.FileLogger.html or the ConsoleLogger https://pulsar.apache.org/api/python/3.4.x/pulsar.ConsoleLogger.html

Samreay commented 6 months ago

Thanks @merlimat, can confirm that

import logging

import pulsar

pulsar_logger = pulsar.ConsoleLogger(log_level=pulsar.LoggerLevel.Error)
logging.basicConfig(level=logging.INFO, format="{message}", style="{")

client = pulsar.Client("pulsar://localhost:6650", logger=pulsar_logger)
producer = client.create_producer("A")
producer.close()
client.close()

does seem to silence the loggers better than using the standard library logging.

I'll leave the issue open if possible though, because the inconsistency between the behaviour is something which should probably be looked into at some point.

BewareMyPower commented 6 months ago

But I would expect that the logger is still used when creating resources like producers, readers, and consumers.

Did you mean the last two lines?

2024-01-15 14:12:08.947 DEBUG [140254733069184] ProducerImpl:128 | [persistent://public/default/A, standalone-0-4] ~ProducerImpl
2024-01-15 14:12:08.948 INFO  [140254733069184] ProducerImpl:722 | Producer - [persistent://public/default/A, standalone-0-4] , [batching  = off]

The logs above are printed by the default logger because they are logged during the exit process of the Python interpreter. At this moment, the Python logger cannot be used. So we fall back to the default logger from the C extension instead.

See https://github.com/apache/pulsar-client-python/blob/4b61547e0f79011b1d76e24a7a9745c1d3d6e36f/src/config.cc#L57-L59

Reference: https://docs.python.org/3/c-api/init.html#c.Py_IsInitialized

Samreay commented 6 months ago

Hmm, that does make it tricky. Effectively my concern here is just that we have our logs from all services routed into a common place, and they all undergo a json transformation. So I'm trying to ensure everything which is collected looks correct. When using python logging, this is easy, because I can attach a json handler or intercept third party library logging, just not when its a direct stdout call.

I noticed in another thread that the C++ client now has an ability to silence all logging, maybe exposing that to the python client would be the easiest way to ensure that developers can have a clean stdout under their control? If not, perhaps just documenting the difference between passing in a python logger and a C++ logger has might alleviate any confusion for other developers trying to integrate the pulsar client

BewareMyPower commented 6 months ago

Here is a workaround to trigger the cleanup (destruction of the objects from the C++ library) explicitly.

import logging

import pulsar

pulsar_logger = logging.getLogger("pulsarDummyLogger")
pulsar_logger.setLevel(logging.INFO)
logging.basicConfig(level=logging.INFO, format="{message}", style="{")

logging.info("This should be logged")
pulsar_logger.info("This isn't logged (correctly)")

client = pulsar.Client("pulsar://localhost:6650", logger=pulsar_logger)
producer = client.create_producer("A")
producer.close()
client.close()
del producer
del client

Then producer and client will be destroyed before the Python interceptor exits.

This should be logged
...
[127.0.0.1:52053 -> 127.0.0.1:6650] Destroyed connection to pulsar://localhost:6650-0
Producer - [persistent://public/default/A, standalone-0-7] , [batching  = off]