jacobkahn commented 2 years ago

Summary

Bind Seq2Seq/autoregressive beam search decoders from Flashlight Text to Python.

Two notable subtleties in how the bindings are structured to avoid overhead:

EmittingModelStatePtr (a typedef'ed std::shared_ptr<void>) is exposed to Python interop via std::shared_ptr<py::object> (which itself is a reference counted wrapper around PyObject*.
- shared_ptr will properly modify refcounts of the py::object such that there aren't lifetime issues round-trip -- if Python garbage collects autoregressive model state, refcount will be > 0 if being used in the decoder.
- get_obj_from_emitting_model_state and create_emitting_model_state can create this type from arbitrary Python objects with ~no overhead.
- This approach also avoids intermediate copies given that the passed py::object refers to the same underlying memory/handle/is COW
EmittingModelUpdateFunc, which is the autoregressive callback defined in Python but called in C++ to get incremental model token scores and model state. This closure is passed from Python --> C++ once at decoder construction; a function pointer's stored in C++ to the Python callable. Opaque types preclude copies of scores from args or return vals -- this will be more carefully investigated/improved over time.

Tests are self-documenting for now for the LexiconFreeSeq2Seq variant.

facebook-github-bot commented 1 year ago

@jacobkahn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D42038797

facebook-github-bot commented 1 year ago

@jacobkahn merged this pull request in flashlight/text@f7b6e13cfde21a45ff30336103356f512b3f15e3.

flashlight / text