ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 211 forks source link

98% cpu usage due to recursion #263

Open hippalus opened 3 years ago

hippalus commented 3 years ago

The dfs recursive method in the WordSegmenter class causes high CPU usage in some scenarios. See below screenshot of JVM profiler and CPU profiler.

2021-03-16  21 06 18

image

ahmetaa commented 3 years ago

Yes, WordSegmenter uses a bad algorithm that works ok in most cases but fails miserably in some cases. It should have been using a dynamic programming approach but sadly it is not. Unfortunately I do not have much spare time to fix this, I would advise using a different algorithm for this.

ahmetaa commented 3 years ago

One idea that may alleviate this issue is to split input from spaces before processing with this. Also please provide an example that may cause this bad recursion.