NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
743 stars 48 forks source link

questions about ICL code for variable tracking #27

Closed vkaul11 closed 4 months ago

vkaul11 commented 5 months ago

Thanks for your work I am a bit confused about the code here
https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/variable_tracking.py#L116

        print(f'internal {is_icl}')
        cutoff = template.index(TASKS['variable_tracking']['template'][:20])
        cutoff_ans = template.index(TASKS['variable_tracking']['answer_prefix'][:10])
        template = ' '.join(template[cutoff:cutoff_ans].split()[:-1]) + template[cutoff_ans:]

I had few questions on the code:

  1. Why do you need to use cutoff and cutoff_ans. Is this to remove INST or what model template? Won't the model template change with every model? Secondly why do you have answer_prefix[:10], I don't understand the reason for this.
  2. From what I understand, in the first pass you generate on ICL learning example and doing so you use variable with length 3 (otherwise it is 5), in the second pass you generate actual text to find the variables, right?
  3. https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/variable_tracking.py#L128 why do we return vars[0] not vars from generate_input_output?
  4. Why do you remove last word after splitting here ' '.join(template[cutoff:cutoff_ans].split()[:-1]) ?
  5. If there is no inserted model template and we only use the task template do we not need this code?
SimengSun commented 5 months ago

Hi there,

In the generated icl example, there can be special tokens that we would like to discard (e.g., [/INST]) before appending the actual example. Therefore we match the answer_prefix by indexing the first 10 characters, and throw away the unwanted special tokens. It's a good catch that when there is no inserted model template, we won't need this part. You are welcome to patch our code by making a pull request!

Indeed we use variable with length 3 instead of 5 in the ICL example as we found some models tend to copy a lot from the ICL example and thus wanted to differentiate the ICL example from the actual input example as much as possible. However, some models still exhibit the copying behavior as we described in the paper.

vars[0] is a list in which we store all the target answer variable names. e.g. for chains [['VAR QAH = 25795', 'VAR FTR = VAR QAH ', 'VAR XCK = VAR FTR ', 'VAR AFN = VAR XCK ', 'VAR AFQ = VAR AFN '], ['VAR OFP = 10860', 'VAR VAU = VAR OFP ', 'VAR SIE = VAR VAU ', 'VAR YIC = VAR SIE ', 'VAR CWP = VAR YIC ']], we have corresponding vars [['QAH', 'FTR', 'XCK', 'AFN', 'AFQ'], ['OFP', 'VAU', 'SIE', 'YIC', 'CWP']] and the first chain is what the model should trace, the rest should be ignored.