abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

Script utilizing LLM #51

Open jcgeo9 opened 10 months ago

jcgeo9 commented 10 months ago

Can you provide a script similar to inference-example.py, that utilises run_generation.py file? i.e instead of command like execution python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-13b-chat-hf \ --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \ --prompt example_inputs/harry_potter_full.txt \ --suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \ --index_devices 1 --datastore_device 1 instead load the model and run inference from python script. Thanks in advance!

abertsch72 commented 7 months ago

You can do this from a script by importing run_generation and calling it with your arguments:


from run_generation import main
main(['--model_type', 'llama', '--model_name_or_path', 'meta-llama/Llama-2-13b-chat-hf', <rest of your args here>])```