ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
48 stars 21 forks source link

[prakriya] Optimize the `tripadi` module #12

Closed akprasad closed 8 months ago

akprasad commented 1 year ago

Profiling indicates that the tripadi module is slow.

Many of the rules in the tripadi need to iterate over every character in the string so that they can apply various sandhi changes. Currently, we create a new CompactString for each of these rules. My rough guess is that we create a dozen such strings for each word we derive, even if none of the rules have scope to apply. CompactString shouldn't stack allocate in most cases, but the copy work required here is still slow.

Once we confirm that this is a problem with profiling, we should avoid the extra copies here. Two approaches that come to mind:

  1. Instead of creating a new string, iterate over the Term strings and manage indices carefully.

  2. Store one copy of the string and rebuild it only if a rule applies. The code would follow the basic pattern of ItPrakriya, e.g., by extending the Prakriya struct with new data and helper methods.

I think (2) is generally cleaner, and it has the side effect of improving our APIs.

akprasad commented 8 months ago

This has been fixed locally. I don't see a performance improvement, sadly, but the resulting API is cleaner.