abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

Prompt with Llama-2 stops after "Loading checkpoint shards: 0%" #56

Closed XmasRock closed 7 months ago

XmasRock commented 9 months ago

Hi, I'm trying to get the Llama-2 example working but I'm stuck with the following issue: the program stops with no message. Any advice on what I can try ? I'm on windows 11 with Nvidia T1000

PS C:\__noel\AAA\github\unlimiformer> py .\src\run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-13b-chat-hf --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " --prompt example_inputs/Annette_et_le_criminel.txt --suffix " [/INST]" --test_unlimiformer --fp16 --length 400 --layer_begin 16 --index_devices 1 --datastore_device 1 11/14/2023 12:56:00 - WARNING - __main__ - device: cpu, n_gpu: 0, 16-bits training: True Using pad_token, but it is not set yet. Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] PS C:\__noel\AAA\github\unlimiformer>

XmasRock commented 9 months ago

I've got only 16G RAM, I noticed that it is fully used and the disk is used 100% after some time. Then the program stops. image

urialon commented 9 months ago

Hi @XmasRock , Thank you for your interest in our work!

I am guessing that this is indeed a memory issue, and when the memory is full, the computer tries to offload data to the disk ("swap").

I think the only option is to use a machine with more RAM, or try a smaller model as the 7B version of llama.

Best, Uri