Open Johncrtz opened 7 months ago
No wonder my notebook kernel is always died when I run the ragas evaluation with llamaCPP
我只使用一条数据,测试一个指标,有些时候都会out of memory。看起来像是计算时同时进行推理。尝试改代码但是没成功。
Hello, I am encountering the same problems running the evalutaion on RAGAS with Llama3 as opensource evaluator model.
Did you find any solution?
Hello, i created a testset and run it through my RAG pipeline to get documents and a answer for each question. I now have 50 pairs of [question, ground_truth, documents, answer] that i want get the context_recall from.
Code for my custom LLM:
After i do the evaluation:
It runs for a while and then ends up with the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB. GPU 1 has a total capacity of 31.74 GiB of which 29.44 MiB is free. Including non-PyTorch memory, this process has 31.70 GiB memory in use. Of the allocated memory 27.44 GiB is allocated by PyTorch, and 3.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I tried to customize pytorch memory config to make it more efficient, this however did no change:
Here you can see my memory allocation:
Is there any way to do the evaluation with a custom LLM and not consume ungodly amounts of memory? imo 50 questions are not too much and i jus expected it to work. Does somone know how to handle this?