confluentinc / confluent-kafka-python

Confluent's Kafka Python Client
http://docs.confluent.io/current/clients/confluent-kafka-python
Other
3.72k stars 882 forks source link

OverflowError: Python int too large to convert to C int when using confluent_kafka Avro deserializer #1716

Open AndreaBencini90 opened 4 months ago

AndreaBencini90 commented 4 months ago

Description

I'm encountering an OverflowError when attempting to deserialize messages using the confluent_kafka Avro deserializer in Python. Here's a simplified version of my code:

class ConsumerKafka:
    def __init__(self, deserializer):
        self.deserializer = deserializer
        self.consumer = Consumer()

    def decode(self, msg_value):      
            deserialized_data = self.deserializer(msg_value, None)

self.serilizer is AvroDeserializer object from onfluent_kafka.schema_registry.avro

when i call the self.deserializer(msg_value, None)

Traceback (most recent call last):
  File "c:\Users\048115571\Documents\python\TTA\read_from_topic\modules\consumer.py", line 30, in decode
    deserialized_data = self.deserializer(msg_value, None)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\048115571\AppData\Local\Programs\Python\Python311\Lib\site-packages\confluent_kafka\schema_registry\avro.py", line 429, in __call__
    obj_dict = schemaless_reader(payload,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "fastavro\\_read.pyx", line 1142, in fastavro._read.schemaless_reader
  File "fastavro\\_read.pyx", line 1169, in fastavro._read.schemaless_reader
  File "fastavro\\_read.pyx", line 748, in fastavro._read._read_data
  File "fastavro\\_read.pyx", line 621, in fastavro._read.read_record
  File "fastavro\\_read.pyx", line 740, in fastavro._read._read_data
  File "fastavro\\_read.pyx", line 558, in fastavro._read.read_union
  File "fastavro\\_read.pyx", line 724, in fastavro._read._read_data
  File "fastavro\\_read.pyx", line 393, in fastavro._read.read_array
  File "fastavro\\_read.pyx", line 748, in fastavro._read._read_data
  File "fastavro\\_read.pyx", line 621, in fastavro._read.read_record
  File "fastavro\\_read.pyx", line 740, in fastavro._read._read_data
  File "fastavro\\_read.pyx", line 558, in fastavro._read.read_union
  File "fastavro\\_read.pyx", line 770, in fastavro._read._read_data
  File "fastavro\\_logical_readers.pyx", line 22, in fastavro._logical_readers.read_timestamp_millis
  File "fastavro\\_logical_readers.pyx", line 24, in fastavro._logical_readers.read_timestamp_millis
OverflowError: Python int too large to convert to C int

How to reproduce

confluent_avro 1.8.0 confluent-kafka 2.3.0 fastavro 1.9.2 kafka-python 2.0.2

pranavrth commented 3 months ago

Can you confirm that you are using long and not int for the type? I can see it is trying to deserialize time in milliseconds.

Can you please provide the schema information and the message?

AndreaBencini90 commented 3 months ago

the problem happend when the cosumer try to deserialize this value -9223370327508000000 on the topic. Obviusly this is an error of the data and not of the library. Since the consumer encounters an error, it prevents consuming the rest of the message. In my opinion, it would be nice to have an option to handle these situations. The user should be able to choose whether to let it break and throw an exception or not convert the data field where a similar issue is observed.

mlcivilengineer commented 3 weeks ago

I seem to be getting this error as well when we define the date in milliseconds to be 865716973869987, which is the date Sat Jun 25 29403 09:11:09. It seems that fastavro is using the datetime library from python to parse these dates, and I don't know if I read correctly, but it seems that datetime only supports dates until datetime.date(9999, 12, 31).