flashlight / text

Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.
MIT License
64 stars 14 forks source link

Separate and make optional KenLM components #43

Closed jacobkahn closed 1 year ago

jacobkahn commented 1 year ago

Separates KenLM-related components of Flashlight Text into a separate library: libflashlight-text-kenlm. This library is built when FL_TEXT_USE_KENLM is enabled. In particular, with a shared lib configuration, no KenLM symbols are bundled into libflashlight-text-kenlm.

When building Python bindings, Flashlight Text tries to find kenlm installed via pip (vis-a-vis pip install git@https://github.com/kpu/kenlm). This PR adds machinery to download KenLM headers when building wheels from source but still links to the kenlm installed via pip. This enables dynamically loading a KenLM built from pip by a consuming C++ library from FL Text.

Some other details/changes:

NB: as above, the import path for KenLM support has changed. It's now:

from flashlight.lib.text.decoder.kenlm import KenLM
# was previously:
from flashlight.lib.text.decoder import KenLM

In order for this to work fully on macOS, https://github.com/kpu/kenlm/pull/413 needs to be merged in because FL Text needs to link at build time with a KenLM dylib and at runtime with the same dylib per portable rpath. Linking with a macOS so bundle at runtime sadly won't work because of how flashlight-text-kenlm was linked.

Test plan: CI, cibuildwheel and local wheel tests across macOS and Linux.

Checklist

mthrok commented 1 year ago

no KenLM symbols are bundled into libflashlight-text-kenlm.

In KenLM.cpp, I see lm::ngram::State and lm::ngram::State are used, so don't they show up as undefined symbols?

facebook-github-bot commented 1 year ago

@jacobkahn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@jacobkahn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D43294535

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D43294535

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D43294535

facebook-github-bot commented 1 year ago

@jacobkahn merged this pull request in flashlight/text@dcfc637d03ab30b053fcf99fc664f43eff8cb3f4.