If the listener is not running for some time and a lot of notifications are accumulated there's a problem that django queryset loads everything into memory.
This MR changes this so that server side cursors are used.
Testing
Four scenarios are compared here:
Original - the implementation from the released version.
Server side cursors (but disabled in django settings)
Server side cursors properly enabled
Manual batching.
Original
This was measured after 1000 records are processed. The numbers do not change after
more records are processed:
Partition of a set of 400518 objects. Total size = 49051007 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 100306 25 28659558 58 28659558 58 str
1 50000 12 6000000 12 34659558 71 dict of pgpubsub.models.Notification
2 50000 12 4400000 9 39059558 80 dict of django.db.models.base.ModelState
3 50000 12 2800000 6 41859558 85 django.db.models.base.ModelState
4 50000 12 2800000 6 44659558 91 pgpubsub.models.Notification
5 50000 12 2400000 5 47059558 96 datetime.datetime
6 49027 12 1372828 3 48432386 99 int
7 64 0 449664 1 48882050 100 list
8 101 0 37320 0 48919370 100 types.CodeType
9 402 0 28968 0 48948338 100 tuple
<38 more rows. Type e.g. '_.more' to view.>
Server side cursors - disabled in django settings
Numbers do not change during the processing.
Partition of a set of 259131 objects. Total size = 40751628 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 102716 40 31543457 77 31543457 77 str
1 52687 20 4195192 10 35738649 88 tuple
2 50000 19 2400000 6 38138649 94 datetime.datetime
3 49818 19 1395148 3 39533797 97 int
4 467 0 452632 1 39986429 98 list
5 500 0 210360 1 40196789 99 types.CodeType
6 1078 0 107391 0 40304180 99 bytes
7 545 0 82840 0 40387020 99 function
8 87 0 75032 0 40462052 99 re.Pattern
9 46 0 71504 0 40533556 99 type
<104 more rows. Type e.g. '_.more' to view.>
Server side cursor - enabled in django settings
Numbers stay the same during the processing:
Partition of a set of 19124 objects. Total size = 2898545 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 6716 35 1562078 54 1562078 54 str
1 4690 25 355408 12 1917486 66 tuple
2 500 3 210360 7 2127846 73 types.CodeType
3 1079 6 107791 4 2235637 77 bytes
4 2000 10 96000 3 2331637 80 datetime.datetime
5 546 3 82992 3 2414629 83 function
6 87 0 75032 3 2489661 86 re.Pattern
7 46 0 71504 2 2561165 88 type
8 442 2 66976 2 2628141 91 list
9 1818 10 51148 2 2679289 92 int
<109 more rows. Type e.g. '_.more' to view.>
Manual Batching
This has somewhat lower memory consumption and does not require django settings changes but the implementation is more complicated. It also requires adding sorting to the notifications fetching query and an index to make the fetching efficient.
Note: Numbers stay the same during the processing.
Partition of a set of 13046 objects. Total size = 1803612 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3703 28 599557 33 599557 33 str
1 500 4 210360 12 809917 45 types.CodeType
2 2674 20 194432 11 1004349 56 tuple
3 1078 8 107391 6 1111740 62 bytes
4 545 4 82840 5 1194580 66 function
5 87 1 75032 4 1269612 70 re.Pattern
6 46 0 71504 4 1341116 74 type
7 500 4 60000 3 1401116 78 dict of pgpubsub.models.Notification
8 429 3 54160 3 1455276 81 list
9 500 4 44000 2 1499276 83 dict of django.db.models.base.ModelState
<96 more rows. Type e.g. '_.more' to view.>
If the listener is not running for some time and a lot of notifications are accumulated there's a problem that django queryset loads everything into memory.
This MR changes this so that server side cursors are used.
Testing
Four scenarios are compared here:
Original
This was measured after 1000 records are processed. The numbers do not change after more records are processed:
Server side cursors - disabled in django settings
Numbers do not change during the processing.
Server side cursor - enabled in django settings
Numbers stay the same during the processing:
Manual Batching
This has somewhat lower memory consumption and does not require django settings changes but the implementation is more complicated. It also requires adding sorting to the notifications fetching query and an index to make the fetching efficient.
Note: Numbers stay the same during the processing.