SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.7k stars 349 forks source link

Predicting memory usage - Memory Access Violation #952

Open AgentSmithers opened 1 month ago

AgentSmithers commented 1 month ago

Description

When running InferAsync with 16Gb of ram I hit a peek of 100% memory usage. I added memory to resolve the issue but was wondering if there is a way to predict the memory usage by calculating the size of the token input to cancel and give feedback that additional ram is required instead of triggering a memory corruption/violation error. Anyone know if this calculation is possible?

martindevans commented 1 month ago

I'd recommend asking this upstream in llama.cpp, whatever they say will apply to LLamaSharp too.

AgentSmithers commented 1 month ago

Thanks for the feedback. I'll give it a go with them.