giellatekno / neahttadigisanit

Saami dictionary webapp
Other
2 stars 2 forks source link

High memory usage #32

Open Phaqui opened 3 months ago

Phaqui commented 3 months ago

NDS uses a lot of memory on the server, so much so that it crashed one time. This will have to be reduced, to prevent the server from crashing again.

Phaqui commented 3 months ago

Commit 948b36c7b30edd71b25618c1d2c9ad1fb8a5364b introduces a new trie implementation. It should behave exactly the same as the old one, with the same interface and all. It seems to have reduced the total memory usage on the server by 1-3 GB, which is significant.

Phaqui commented 3 months ago

I found a profiler called Scalene, and with that, it seems that one of the main offenders of the high memory usage is the lxml-trees, and also the xpath objects that are used for searching that tree. For sanit, whose dictionaries add up to roughly 30 MB of xml data, the in-memory representation of the trees immediately comes to around 500 MB. Then, on top of that, the xpath-objects allocates additional memory when doing searches, but I believe those will be freed quite fast afterwards.

298 │ │ 6% │ 1% │ 4% │ 490M │▁▁▃▃▃▄▄▅▅ 75% │ 109 │ self.tree = etree.parse(filename)

The short story is therefore: If we want to do anything about the memory issues, we have to use a completely different approach to the in-memory representation and searching through the dictionaries.

The other alternative, is that we can ask for more RAM on the server.

Phaqui commented 1 month ago

More RAM was requested, and the server now has 32 GB. We're still hitting the limit, though. There could perhaps be leaks somewhere? Or is usage just that much up?

albbas commented 1 month ago

Are you using the library version of hfst? I have experienced that if the fsts loaded more than once, it binds up the memory, multiplying the memory usage for each time the fsts are loaded.

Phaqui commented 1 month ago

No, NDS subprocess'es out to call the hfst- binaries instead of using the library directly. We were worried about incompatibilities and other bugs using the library directly (though I don't think there should be any). Using the library directly uses more memory, because it loads the .hfst* (and other binary files) into memory once for each file, and currently is therefore definitely not an option.

Phaqui commented 1 month ago

Using the old Trie implementation (specifically neahtta/nds_lexicon/trie.py instead of neahtta/nds_lexicon/new_trie.py, as referenced in neahtta/nds_lexicon/lexicon.py roughly on line 460) did not help memory usage.

It still starts at roughly 60% after restarting NDS, and fills up to around 29.9 GB of some ~31GB available after a while.

Next attempt may be to try to reduce the amount of workers.

Phaqui commented 1 month ago

I have reduced the number of workers of all instances from 6 to 4, to try to reduce memory usage. Will have to keep an eye on it, to see if it makes any difference. Also keep an eye out of anyone notices any sort of slowness or instability.

Phaqui commented 1 month ago

Memory usage is now down from ~30 to ~24G. Of course, traffic may be lower. Will have to keep it at 4 for a while, and then switch over to 6 again for some time to verify.