Closed enochlev closed 6 months ago
Hi @enochlev,
It looks like you are trying to generate the remaining tokens in a sequence given a common prefix. Given this is your goal, you need to ensure that the suffix you are attempting to encode, begins with the correct indices into the KV cache. This should be the last token index of the common prefix. Note that it may still be more performant to generate a larger context encoding depending on your use case
Based on your comment, are you are telling me that the cache_ids have to be correctly incrementally aligned? I thought I did so in my example
First forward call: (prefix) input_ids: [[0, 0, 0, 0, 0, 0, 0 ...., 13, 4013, 5828, 338, 1048, 29871]] cache_ids: [ 0, 1, 2, 3, 4, 5, 6....., 794, 795, 796, 797, 798, 799]
Standard forward call with live cache input_ids: [[1058]] cache_ids: [800] **probabilities seems to be fine with this
Next forward call to skip 4 forward calls..#what I am trying to implement input_ids: [[29871, 263, 5381, 767, 1058]] cache_ids: [801, 802, 803, 804, 805]. **probabilities seemed to get messed up with this even though the cache_ids are aligned
When you are mentioned prefix, are you suggesting the set_prefixed function call?
I am assuming its not solvable at the moment, or I misunderstood your reply.
I will be closing issue in a few days
I am trying to skip generating some tokens that could be skipped via copy paste to hopefully reduce speed up by 70% given my use case, however the problem I am coming up with is when I reset caching... the overhead takes too much time. When I maintain caching, its probabilities seems to not be totaly wrong.
The main goal in end to have constrained generation that supposed to save time if there is only one possible next token to genrate
Here is my current code to reproduce this
they key logs is as so... I thought I was doing everything right, but it seems to still produced wrong results
At the far end of town where the Gricklegrass grows and the wind smells slowandsour when it blows and no birds ever sing excepting old crows is the Street of the Lifted Lorax And deep in the Gricklegrass some people say if you look deep enough you can still see today where the Lorax once stood just as long as it could before somebody lifted the Lorax away What was the Lorax Any why was it there And why was it lifted and taken somewhere from the far end of town where the Gricklegrass grows The old Onceler still lives here Ask him he knows This story is about a business man whoO _th- 7 in 1.5860817432403564 secondsthe key details in the logs is that I add 5 extra tokens on the next model logits call along with 5 extra cach_ids correctly ordered. I thought I did it correctly, but after normal geneartion... it prouduced garbage