Open m0nsky opened 3 months ago
Since #761 the BatchedExecutor
will automatically split work up into multiple batches (so any size prompt can be handled, you just need to call Infer()
enough times to process the entire queue of work) and since #770 BatchedExecutor
has had LLava support.
Description
I'm building a llava application. When the amount of tokens in my initial prompt is bigger than the batch size, the
InteractiveExecutor
will throw a:When adding a breakpoint to
LLamaInteractExecutor
line 257, we can observe the following:My initial prompt is 1067 tokens (I have tokenized it and counted it) and the embed image is at position 1055 (which is somewhere at the end of my prompt), but
_embeds
only goes to 512 (the batch size).Reproduction Steps
Environment & Configuration
Known Workarounds