Closed percevalw closed 1 year ago
:exclamation: No coverage uploaded for pull request base (
main@9ca8fd0
). Click here to learn what that means. Patch has no changes to coverable lines.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Description
This PR introduces a new
HuggingfaceEmbedding
component, which wraps Huggingface models (such as LayoutLM or LILT). Compared with using the raw huggingface model, this wrapper offers a simple mechanism for splitting long documents into sliding windows before sending them to the model (since the maximum number of tokens sent to the transformer is capped at 512, and sending entire sequences all at once can be memory-intensive).Example
Here is an example of how to define a pipeline with the HuggingfaceEmbedding component:
This model can then be trained following the training recipe.