ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.25k stars 9.19k forks source link

Recoverable Error Handling #4385

Open martindevans opened 8 months ago

martindevans commented 8 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Feature Description

Use a form of error handling which doesn't immediately terminate the process (GGML_ASSERT).

Motivation

Currently llama.cpp checks various conditions with GGML_ASSERT and if it fails it will immediately abort(). There's no way to handle this error and it immediately kills the process, which makes it difficult to build robust services using llama.cpp. see for example: https://github.com/SciSharp/LLamaSharp/issues/343#issuecomment-1838948145

This has presented an issue in LLamaSharp (which I'm one of the developers of). In C# it's more reasonable that an error might throw an exception, which can be handled. Unfortunately there's no way at all for us to implement this.

Possible Implementation

I don't know C++ well enough to comment. But from the perspective of LLamaSharp (and probably other wrappers) any kind of error signalling that didn't immediately terminate the process would be great!

slaren commented 8 months ago

We could do better in this respect, but the "CUDA error 700" in the linked issue indicates a programming error and something like that is probably never going to be recoverable. That specific error should have already been fixed.

martindevans commented 8 months ago

That particular issue was just the motivation for a larger change in error handling, thanks for confirming that though 👍

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 30 days with no activity.

martindevans commented 5 months ago

No activity, but hopefully still relevant as a long term goal!

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

oldgithubman commented 1 month ago

@slaren reopen? Is there no way to exempt issues from stale?