Closed vegabook closed 7 years ago
I tested the above code up to 25 million inserts. I did not see the 10x increase in total memory. The memory consumed by the application code stays stable, while memory consumed by the server grows with the data, almost one to one.
Summary of test on Ubuntu 16.04 VM, running CE 3.11.0.1 1 - Boot up VM from shutdown, open terminal, start htop: Tot mem: 663M 2 - Start second terminal, launch aerospike single node server: 811M 3 - Start AMC: 838M 4 - Start firefox, launch AMC dashboard: 1.08G total, 0 Mb server 5 - Start above code (memleaktest.py) : 1.09G / 0Mb 6 - All 720K keys inserted - first code loop: 1.16G/62.9M 7 - 5 mill insert updates to maps: 1.36G/142M 8 - 10 mill insert updates to maps: 1.45G/224M (Steps 1 -8 were repeated twice - shutdown, reboot VM, similar data) 9 - 20 mill insert updates to maps: 1.63G/385M memleaktest.py mem usage remains at 1.2% to total memory throughout.
10 - 25 million insert updates to maps: 1.71G/466M
I have screen shots for all steps 1-9.
Step#5:
Step#6:
Step#7: (5 mill)
Step#8: (10 mill)
Step #9: (20 mill)
Step#10 (25 million)
@pygupta : Were you running with python 2.x? This looks to be a problem with python 3.x due to the unicode conversion of the string in the key. I have located the leak and will have a patch to the client tomorrow.
Duh... I missed the note in bold.. only affects Python 3. I ran with 2.7.1.2 $ python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2
Yup, with python3, it just takes off!
@vegabook: Can you try your program with the 2.0.7 client release?
@jboone100 - does not look like it fixed. 1 - updated to 2.0.7
$ sudo python3 -m pip install aerospike -U
...
Collecting aerospike
Downloading aerospike-2.0.7.tar.gz (102kB)
100% |████████████████████████████████| 102kB 347kB/s
Installing collected packages: aerospike
Found existing installation: aerospike 2.0.6
Uninstalling aerospike-2.0.6:
Successfully uninstalled aerospike-2.0.6
Running setup.py install for aerospike ... |
done
Successfully installed aerospike-2.0.7
(No AMC, only terminals, fresh reboot.) Check:
piyush@ubuntu:~/discussRepo/pyTestMemLeak$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import aerospike
>>> print(aerospike.__version__)
2.0.7
>>>
Then, started aerospike server and ran python3 memleaktest.py. 1 - After initiating all keys, total mem = 0.864 Gb, server = 62 Mb 2 - After 5 million update inserts: total mem = 1.14Gb, server = 143 Mb
% mem used by python3 memleaktest.py increases steadily, upto 12% by 5 million inserts. Step 1 - screenshot: Step 2 - screenshot:
Can you try running leaks <pid>
on your process while it is running. It was pretty clear from the output of that tool that the 2.0.6 client was leaking and 2.0.7 isn't.
@jboone100 I don't know "leaks" - on Ubuntu?.. but here is RSS growing:
piyush@ubuntu:~$ leaks 5161
leaks: command not found
piyush@ubuntu:~$ ps -p 5161 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5161 233564 11.5 python3 memleaktest.py
piyush@ubuntu:~$ ps -p 5161 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5161 237916 11.7 python3 memleaktest.py
piyush@ubuntu:~$ ps -p 5161 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5161 239708 11.8 python3 memleaktest.py
piyush@ubuntu:~$ ps -p 5161 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5161 242780 11.9 python3 memleaktest.py
piyush@ubuntu:~$ ps -p 5161 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5161 245852 12.1 python3 memleaktest.py
RSS tracking is what has been also coded in the application using psutil - and that keeps increasing. With python 2.7 it was stable, same value. See earlier screeshots from my first post. (The 0.023... number.) RSS : (real memory size or resident set size in 1024 byte units)
Double check with python 2.7, pretty rock solid.
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
piyush@ubuntu:~$ ps -p 5599 -o pid -o rss -o %mem -o cmd
PID RSS %MEM CMD
5599 28204 1.3 python memleaktest.py
I can confirm that my 2.0.7 driver is also still leaking memory on Python 3.
After 10 million inserts:
Using 2.0.7:
Fixed in the 2.0.8 release.
Fix validated. No more memory leaks with python3.
13602 per second, total insertions: 3515973 GB used: 0.028585984
13558 per second, total insertions: 3529531 GB used: 0.028585984
13899 per second, total insertions: 3543430 GB used: 0.028585984
14112 per second, total insertions: 3557542 GB used: 0.028585984
12829 per second, total insertions: 3570371 GB used: 0.028585984
13347 per second, total insertions: 3583718 GB used: 0.028585984
13139 per second, total insertions: 3596857 GB used: 0.028585984
yep. Thanks for your efforts on this.
Refers to this Community board entry: https://discuss.aerospike.com/t/large-memory-usage-by-python-driver-on-big-batch-insert/3819
There appears to be growing memory usage, an order of magnitude greater than the space occupied by the data itself, by the Python driver when inserting into an ordered map. Here is a simplified version of the code from the Community forum link above. This will insert 100 million floats, batched by a ticker code and an integer, into ordered maps. The memory used by this piece of code grows enormously as the inserts progress. After 10 million inserts, AMC shows Aerospike test db using only 208MB, whereas this piece of code's memory footprint has grown to 1.5GB. And it continues upwards from there.
UPDATE: This appears only to affect the python 3 driver. Tested with version 2.0.6.
This code will work with both python 2.7 and 3.x. Just need to install numpy and psutil.