bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.15k stars 163 forks source link

Memory leak that consumes 5 MB per usage of phonemize function #153

Open ollayf opened 1 year ago

ollayf commented 1 year ago

Describe the bug There is a memory leak where each pass of the phonemize function for me takes up at least 5 mb. For some reason I have tried many things but to no avail. This is how I use it in my python code Screenshot from 2023-07-03 00-28-53

Phonemizer version Screenshot from 2023-07-03 00-29-38

System Ubuntu 20.04 LTS Python 3.8

To reproduce Screenshot from 2023-07-03 00-29-38

Expected behavior Everytime the function ends, the memory should be collected in the garbage and released back to the OS. But every time it runs it permanently takes up 5 MB. This 5MB is seen from when i use htop and when i use psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2 in the python code

Additional context

mmmaat commented 1 year ago

Hi,

See here you may try passing a backend to your phonemize_word function. When creating a new espeak phonemization instance, the code actually copies the espeak shared library somewhere in a temp directory (that's the 5mb). Normally the directory is deleted at exit or when garbage collected (see here this is a bit complex to deal with Linux/Mac/Windows).

magicse commented 7 months ago

Same problem leak memory on every import phonemizer

phonemes = phonemizer.phonemize({orig_text_wo_stress}, language="en")

lokmantsui commented 3 months ago

don't instantiate one phonemizer backend per call https://github.com/bootphon/phonemizer/blob/d9f9ed266aa5cc2dd9e5eaea2c9571ab5024893c/phonemizer/phonemize.py#L206