geonlp-platform / pygeonlp

pygeonlp, A python module for geotagging Japanese texts.
https://geonlp.ex.nii.ac.jp/
BSD 2-Clause "Simplified" License
19 stars 1 forks source link

Decrement reference count #21

Closed KatHaruto closed 7 months ago

KatHaruto commented 8 months ago

@geonlp-platform @t-sagara 一部のPyObjectのメモリが解放されていない問題を修正 resolve https://github.com/geonlp-platform/pygeonlp/issues/19

python 3.10 pygeonlp : v1.2.2

検証スクリプト main.py

import os
import psutil
import pygeonlp.api as api

process = psutil.Process(os.getpid())
api.init()
def main():
    text = "私は昨日飯田橋にいました。"

    for i in range(10001):
        if i % 1000 == 0:
            print(f'loop: {i} memory usage: {process.memory_info().rss / 1024 / 1024} MB')

        api.geoparse(text)

if __name__ == "__main__":
    main()

修正前


結果

$ python main.py
loop: 0 memory usage: 99.90625 MB
loop: 1000 memory usage: 124.65625 MB
loop: 2000 memory usage: 145.78125 MB
loop: 3000 memory usage: 166.90625 MB
loop: 4000 memory usage: 187.90625 MB
loop: 5000 memory usage: 209.03125 MB
loop: 6000 memory usage: 230.03125 MB
loop: 7000 memory usage: 251.15625 MB
loop: 8000 memory usage: 272.15625 MB
loop: 9000 memory usage: 293.28125 MB
loop: 10000 memory usage: 314.28125 MB

valgrindによるプロファイリングから一部抜粋

PYTHONMALLOC=malloc valgrind --tool=memcheck --leak-check=full  --suppressions=valgrind-python.supp  --log-file=output.txt --track-origins=yes python main.py 
...
==67182== 6,471,946 bytes in 110,068 blocks are definitely lost in loss record 12,254 of 12,254
==67182==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==67182==    by 0x1CCD8B: ??? (in /usr/bin/python3.10)
==67182==    by 0x8E9B2BB: picojson_to_pyobject(picojson::value const&) (py2pico.cpp:170)
==67182==    by 0x8E9E853: geonlp_ma_parse_node(GeonlpMA*, _object*) (pygeonlp.cpp:121)
==67182==    by 0x1F0D5F: ??? (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)

==67182== 3,207,058 bytes in 55,035 blocks are definitely lost in loss record 12,252 of 12,254
==67182==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==67182==    by 0x1CCD8B: ??? (in /usr/bin/python3.10)
==67182==    by 0x8E9B2BB: picojson_to_pyobject(picojson::value const&) (py2pico.cpp:170)
==67182==    by 0x8EA0323: geonlp_ma_get_word_info(GeonlpMA*, _object*) (pygeonlp.cpp:142)
==67182==    by 0x1F0D5F: ??? (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==67182==    by 0x20D347: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==67182==    by 0x1F53F7: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
...

修正後

$ python main.py
loop: 0 memory usage: 101.6484375 MB
loop: 1000 memory usage: 103.0234375 MB
loop: 2000 memory usage: 103.0234375 MB
loop: 3000 memory usage: 103.0234375 MB
loop: 4000 memory usage: 103.0234375 MB
loop: 5000 memory usage: 103.0234375 MB
loop: 6000 memory usage: 103.0234375 MB
loop: 7000 memory usage: 103.0234375 MB
loop: 8000 memory usage: 103.0234375 MB
loop: 9000 memory usage: 103.0234375 MB
loop: 10000 memory usage: 103.0234375 MB
t-sagara commented 7 months ago

ありがとうございます。ご指摘通り std::string は解放する必要がありますね。