How many tokens should be generated for xRAG?

Hannibal046 / xRAG

Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

80 stars 5 forks source link

How many tokens should be generated for xRAG? #9

Open davidmrau opened 2 months ago

davidmrau commented 2 months ago

Throughout the repo I find different values for max_new_tokens,

30: https://github.com/Hannibal046/xRAG/blob/121fa4180a8c1fa0ec1af5901d879452e5c9ce89/README.md?plain=1#L79
100: https://github.com/Hannibal046/xRAG/blob/121fa4180a8c1fa0ec1af5901d879452e5c9ce89/src/eval/run_eval.py#L227
20: https://github.com/Hannibal046/xRAG/blob/121fa4180a8c1fa0ec1af5901d879452e5c9ce89/tutorial.ipynb

can you please indicate which number should be used?

davidmrau commented 2 months ago

Additionally, how did you determine the generation_length 30 for the efficiency results?

Hannibal046 commented 2 months ago

Hi, this is not a xrag-specific parameter. It depends on the usage setting. In the tutorial, we set it 20 for illustration purpose. For efficiency results, we use the actual number of generated tokens from our tested downstream dataset, (triviaqa if i remember correctly).

davidmrau commented 2 months ago

If I'm running your method as a baseline, what number should I use?

davidmrau commented 2 months ago

"we use the actual number of generated tokens from our tested downstream dataset" Do you mean the number of generated tokens after applying the stopping criteria? Because, as you said yourself, the model can and will generate an arbitrary number of tokens until max_new_tokens is reached.

Hannibal046 commented 2 months ago

If you want to use it as baseline, you could just set it as 100. The reason why I set it 30 in the profiler.py is because in this file, as you may notice, the input is randomly initialized, so the stopping criteria is meaningless in this setting.

https://github.com/Hannibal046/xRAG/blob/121fa4180a8c1fa0ec1af5901d879452e5c9ce89/src/language_modeling/profiler.py#L55-L62