divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

Memory map file loading #50

Open bbqsrc opened 2 years ago

bbqsrc commented 2 years ago

Grammar checking is using quite a lot of RAM on our Divvun API server:

image

We've mitigated this for the spellchecking in DivvunSpell by using mmap instead of loading data into RAM, with minimal performance penalty in our use cases. Is this something that can be implemented for these grammar checking pipelines?

unhammer commented 2 years ago

Is that 1313M RES on startup, or could there be a leak? (I'm seeing about half when I test with se.zcheck -n smegramrelease)

TinoDidriksen commented 2 years ago

Isn't that a persistent pipe using CG-3's libcg3 API as part of the process? 'cause if so then https://github.com/GrammarSoft/cg3/issues/74

unhammer commented 2 years ago

Hm, could perhaps reload the data every so often as a workaround, though it might be easier to just restart the divvun-checker process in that case ;-)

flammie commented 2 years ago

I was profiling a bit for fun and it least my version that uses hfst-ospell didn't really have memory leaks but used up increasing amount of memory on some cache, I disabled that cache in the last version I hope if you can test that again? I guess we are planning to replace hfst-ospell stuff with divvunspell especially if it continues to be the bottleneck?

bbqsrc commented 2 years ago

ah, is it using hfst-ospell? hehe, well we need to fix that then.

bbqsrc commented 2 years ago

If you could give me a list of the functionality that is used by libdivvun from hfst-ospell, I can inventory anything missing for it to be ported across.

If there's not much, I can publish a stable C API header somewhere (basically leached straight from divvunspell-sdk-swift without the Swift ;) )

flammie commented 2 years ago

Mmh, I cannot remember if I made this stuff anymore but main part of hfst_ospell seems to be in speller::Spell here: https://github.com/divvun/libdivvun/blob/master/src/cgspell.cpp#L136 maybe @unhammer remembers?

snomos commented 2 years ago

The main additions to standard hfst-ospell are:

@unhammer would know more 😊

bbqsrc commented 2 years ago

oh god analysis, nooooo. Someone else can port that across, hahaha

unhammer commented 2 years ago

Yeah as the code shows, we just use ZHfstOspeller::suggest and ZHfstOspeller::analyseSymbols from hfst-ospell.

You can easily make a pipeline without the speller step and check if that takes the pain away (just edit the pipespec.xml in your zcheck zip and remove that one element).