flashlight / text

Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.
MIT License
64 stars 14 forks source link

Swap to CMake FetchContent for KenLM, remove kenlm_utils dependency #51

Closed jacobkahn closed 1 year ago

jacobkahn commented 1 year ago

See title. Now that we're not using ExternalProject, KenLM installs properly with Flashlight Text, even when Flashlight Text is pulled in via CMake FetchContent (e.g. in Flashlight). This also removes the dependence on the kenlm_util library, and since no Boost symbols are required here, this removes the Boost requirement for Flashlight Text + KenLM setups, as KenLM can be built with BUILD_TOOLS=OFF, which removes the Boost dep.

One annoying consequence is that the include path of KenLM when pulled in with FetchContent is tough to wrangle to a custom apex dir, so lm/... is now the entry point header for KenLM components and their internal includes. This should still work in fbcode given existing setups/using langtech. Some build systems using extremely old versions of KenLM will no longer work, but those envs should update.

Test plan: CI + cibw + numerous local tests.

facebook-github-bot commented 1 year ago

@jacobkahn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@jacobkahn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@jacobkahn merged this pull request in flashlight/text@ebab3336632ff10d0639690b4f4b5dabaad443d6.