django-ftl / fluent-compiler

High performance Python implementation of Fluent, Mozilla's l10n language
Other
21 stars 4 forks source link

Optimisations for compiling large fluent files #32

Open leamingrad opened 1 month ago

leamingrad commented 1 month ago

Hi 👋. Firstly, thanks for this library - it is really useful!

I'm currently working with fluent-compiler (via django-ftl) as part of a large Django project. Its generally been great, but unfortunately compiling our fluent files contributes quite a lot (~10s) to our apps startup time.

With that in mind, I was hoping to contribute a couple of optimisations to speed up the compilation of large fluent files, and have raised this issue to track everything.

Current PRs

I've raised the following PRs, and am happy to make fixups as needed:

Potential next steps

After removing the set copies, the biggest contributor to compile times is the span_to_position function.

The issue is that the function requires us to scan through the text of the fluent file from the start in order to work out the row number of the element.

Ideally we would have these positions when the fluent file is parsed, but python-fluent does not do this for us. I think there are two ways to optimise here:

  1. Update python-fluent to include position information with span information
  2. Tag the FluentParser output with position information inside fluent-compiler

I've got a proof-of-concept for 2 which shaves ~2.5s from the 10K benchmark, but this isn't ideal as it imposes overhead on fluent-compiler.

What do you think the best way forward is? I'd be happy to put a PR up for python-fluent, but I'm not sure if it is worth also putting up a PR for option 2 here in the meantime (since I'm not sure how quickly python-fluent does releases.

spookylukey commented 1 month ago

Thanks so much for opening this ticket and your work on it, it looks great! I'm not sure when I'll have a chance to look at it in detail, so bear with me. In the meantime, I think the python-fluent maintainers would be very open to including position info as you suggested. Even if they are slow to do releases, we may as well start the process sooner rather than later since that seems like the best way forward in the long run.

leamingrad commented 1 month ago

That sounds sensible to me - I'll dig into python-fluent and put up a PR for the position information.

leamingrad commented 1 month ago

I have raised https://github.com/projectfluent/python-fluent/pull/202 to add the position information to python-fluent so will wait to see if that is accepted. If I don't hear back in a week or so, I'll put up a PR for fluent-compiler instead.