Add custom text for embedding

Hi everyone, thank you very much for the work !

For documents that use a lot of acronyms or specific terminology, it would be useful if we could add custom text to the image during the embedding process. This custom text could serve multiple purposes, such as a summary of the entire document or definitions of acronyms or other key terms present on the page.

The goal is to leverage the attention mechanism between this added text and the page. Specifically, the model should be able to focus on this extra text (e.g., acronym definitions) while processing the embedded page, improving retrieval performance.

I'm not sure if the model has been trained to handle this, or if it's already implemented (in this case I'm sorry for this useless issue, but I didn't find this functionnality).

Thank you !

AnswerDotAI / byaldi

Add custom text for embedding #44