Script utilizing LLM - Githubissues

Can you provide a script similar to inference-example.py, that utilises run_generation.py file? i.e instead of command like execution python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-13b-chat-hf \ --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \ --prompt example_inputs/harry_potter_full.txt \ --suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \ --index_devices 1 --datastore_device 1 instead load the model and run inference from python script. Thanks in advance!

abertsch72 / unlimiformer

Script utilizing LLM #51