questions about ICL code for variable tracking

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

743 stars 48 forks source link

print(f'internal {is_icl}') cutoff = template.index(TASKS['variable_tracking']['template'][:20]) cutoff_ans = template.index(TASKS['variable_tracking']['answer_prefix'][:10]) template = ' '.join(template[cutoff:cutoff_ans].split()[:-1]) + template[cutoff_ans:]

Hi there,

In the generated icl example, there can be special tokens that we would like to discard (e.g., [/INST]) before appending the actual example. Therefore we match the answer_prefix by indexing the first 10 characters, and throw away the unwanted special tokens. It's a good catch that when there is no inserted model template, we won't need this part. You are welcome to patch our code by making a pull request!

Indeed we use variable with length 3 instead of 5 in the ICL example as we found some models tend to copy a lot from the ICL example and thus wanted to differentiate the ICL example from the actual input example as much as possible. However, some models still exhibit the copying behavior as we described in the paper.

vars[0] is a list in which we store all the target answer variable names. e.g. for chains [['VAR QAH = 25795', 'VAR FTR = VAR QAH ', 'VAR XCK = VAR FTR ', 'VAR AFN = VAR XCK ', 'VAR AFQ = VAR AFN '], ['VAR OFP = 10860', 'VAR VAU = VAR OFP ', 'VAR SIE = VAR VAU ', 'VAR YIC = VAR SIE ', 'VAR CWP = VAR YIC ']], we have corresponding vars [['QAH', 'FTR', 'XCK', 'AFN', 'AFQ'], ['OFP', 'VAU', 'SIE', 'YIC', 'CWP']] and the first chain is what the model should trace, the rest should be ignored.

NVIDIA / RULER

questions about ICL code for variable tracking #27