Async implementation of LLamaExecutors

asmirnov82 commented 3 days ago

Description

I am developing WPF application that uses LLamaSharp library and particulary LLama Executores (like InstructExecutor and InteractiveExecutor). I expect that code

await foreach (var text in executor.InferAsync(prompt, _inferenceParams))
{
    currentResult.Content += text;
}

doesn't block my UI thread. However, UI freezes.

Looks, that this happens, because InferAsync awaits InferInternal(inferenceParams, args) method. And InferInternal implementations in InstructExecutor and InteractiveExecutor classes are synchronous.

As an experiment I changed the var (result, _) = Context.NativeHandle.Decode(_embeds, LLamaSeqId.Zero, batch, ref _pastTokensCount); line in InstructExecutor to var (result, _) = await Task.Run(() => Context.NativeHandle.Decode(_embeds, LLamaSeqId.Zero, batch, ref _pastTokensCount)); and this solved the issue.

Do you have any plans to add Async implementations for all methods that are awaited by StatefulExecutorBase in all inhereted executors?

martindevans commented 3 days ago

There is a DecodeAsync method in the LlamaContext which should be a "drop in" replacement for Decode in an async context. Would you be interested in putting together a PR updating the three executors (Interact, Instruct and Stateless) to use this?

asmirnov82 commented 2 days ago

Yes, I'll do

SciSharp / LLamaSharp

Async implementation of LLamaExecutors #829

Description