apache / pulsar-client-python

Apache Pulsar Python client library
https://pulsar.apache.org/
Apache License 2.0
49 stars 38 forks source link

[Bug] Memory issue using Record with Array #173

Closed Vincouux closed 7 months ago

Vincouux commented 7 months ago

Version

OS: Ubuntu 22.04 Python: 3.10.2 Pulsar: pulsar-client==3.2.0

Minimal reproduce step

from pulsar.schema import Record, Array, Integer

class A(Record):
    a = Array(Integer(), required=False, default=[])

# Create a first instance of object A
a = A()
a.a # Correctly show empty list
a.a.append(1) # Adding an integer
a.a # Correctly showing list with single element 1 inside

# Instanciating a second object A
b = A()
b.a # Incorrectly show list with single element 1

What did you expect to see?

I expected the A class not to re-use the previous reference to the list.

What did you see instead?

As a developer, instantiating a fresh object in Python generally means it's built from scratch. Therefore, no reference to previously built objects should be used.

gromsterus commented 7 months ago
  1. list is a mutable data structure in Python. In your example, new instances of class A reference and modify the same list object in memory. You can read more about this here.

  2. The Record class has quite simple logic for working with default. As a solution, you could add a default_factory implementation, like here. Or, do not use default and instantiate objects of the Record class by passing arguments manually.

Vincouux commented 7 months ago

I decided to remove default and making every mutable data structures required as a solution ! Thanks for the answer.