Closed aceliuchanghong closed 1 week ago
Did you try the example code in https://huggingface.co/THUDM/LongCite-llama3.1-8b? You need load the model in bfloat16.
o,i see,i tried the demo.py,so it's wrong since i just change model to LongCite-llama3.1-8b
let me try that example
huh,you guys missed import json
in that example, i add it then suc
but in project reuqirments.txt it's lack of
torch
tiktoken
accelerate
thankyou for your help
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 179.46 GiB. GPU 0 has a total capacity of 31.73 GiB of which 14.55 GiB is free. Process 2995003 has 406.00 MiB memory in use. Process 2375671 has 1.59 GiB memory in use. Including non-PyTorch memory, this process has 15.20 GiB memory in use. Of the allocated memory 11.80 GiB is allocated by PyTorch, and 3.03 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
model is LongCite-llama3.1-8b,why need so much memory i can't run on 32*4,even