PaulGilmartin / django-pgpubsub

A distributed task processing framework for Django built on top of the Postgres NOTIFY/LISTEN protocol.
Other
245 stars 12 forks source link

Reduce memory usage during recovery #50

Closed romank0 closed 1 year ago

romank0 commented 1 year ago

If the listener is not running for some time and a lot of notifications are accumulated there's a problem that django queryset loads everything into memory.

This MR changes this so that server side cursors are used.

Testing

Four scenarios are compared here:

  1. Original - the implementation from the released version.
  2. Server side cursors (but disabled in django settings)
  3. Server side cursors properly enabled
  4. Manual batching.

Original

This was measured after 1000 records are processed. The numbers do not change after more records are processed:

Partition of a set of 400518 objects. Total size = 49051007 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 100306  25 28659558  58  28659558  58 str
     1  50000  12  6000000  12  34659558  71 dict of pgpubsub.models.Notification
     2  50000  12  4400000   9  39059558  80 dict of django.db.models.base.ModelState
     3  50000  12  2800000   6  41859558  85 django.db.models.base.ModelState
     4  50000  12  2800000   6  44659558  91 pgpubsub.models.Notification
     5  50000  12  2400000   5  47059558  96 datetime.datetime
     6  49027  12  1372828   3  48432386  99 int
     7     64   0   449664   1  48882050 100 list
     8    101   0    37320   0  48919370 100 types.CodeType
     9    402   0    28968   0  48948338 100 tuple
<38 more rows. Type e.g. '_.more' to view.>

Server side cursors - disabled in django settings

Numbers do not change during the processing.

Partition of a set of 259131 objects. Total size = 40751628 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 102716  40 31543457  77  31543457  77 str
     1  52687  20  4195192  10  35738649  88 tuple
     2  50000  19  2400000   6  38138649  94 datetime.datetime
     3  49818  19  1395148   3  39533797  97 int
     4    467   0   452632   1  39986429  98 list
     5    500   0   210360   1  40196789  99 types.CodeType
     6   1078   0   107391   0  40304180  99 bytes
     7    545   0    82840   0  40387020  99 function
     8     87   0    75032   0  40462052  99 re.Pattern
     9     46   0    71504   0  40533556  99 type
<104 more rows. Type e.g. '_.more' to view.>

Server side cursor - enabled in django settings

Numbers stay the same during the processing:

Partition of a set of 19124 objects. Total size = 2898545 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   6716  35  1562078  54   1562078  54 str
     1   4690  25   355408  12   1917486  66 tuple
     2    500   3   210360   7   2127846  73 types.CodeType
     3   1079   6   107791   4   2235637  77 bytes
     4   2000  10    96000   3   2331637  80 datetime.datetime
     5    546   3    82992   3   2414629  83 function
     6     87   0    75032   3   2489661  86 re.Pattern
     7     46   0    71504   2   2561165  88 type
     8    442   2    66976   2   2628141  91 list
     9   1818  10    51148   2   2679289  92 int
<109 more rows. Type e.g. '_.more' to view.>

Manual Batching

This has somewhat lower memory consumption and does not require django settings changes but the implementation is more complicated. It also requires adding sorting to the notifications fetching query and an index to make the fetching efficient.

Note: Numbers stay the same during the processing.

Partition of a set of 13046 objects. Total size = 1803612 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   3703  28   599557  33    599557  33 str
     1    500   4   210360  12    809917  45 types.CodeType
     2   2674  20   194432  11   1004349  56 tuple
     3   1078   8   107391   6   1111740  62 bytes
     4    545   4    82840   5   1194580  66 function
     5     87   1    75032   4   1269612  70 re.Pattern
     6     46   0    71504   4   1341116  74 type
     7    500   4    60000   3   1401116  78 dict of pgpubsub.models.Notification
     8    429   3    54160   3   1455276  81 list
     9    500   4    44000   2   1499276  83 dict of django.db.models.base.ModelState
<96 more rows. Type e.g. '_.more' to view.>